Business communities in the United States are facing high demand for human resources, but one of the constant challenges is identifying and attracting the right talent, which is perhaps the most important element in remaining competitive. Companies in the United States look for hard-working, talented, and qualified individuals both locally as well as abroad.
The Immigration and Nationality Act (INA) of the US permits foreign workers to come to the United States to work on either a temporary or permanent basis. The act also protects US workers against adverse impacts on their wages or working conditions by ensuring US employers' compliance with statutory requirements when they hire foreign workers to fill workforce shortages. The immigration programs are administered by the Office of Foreign Labor Certification (OFLC).
OFLC processes job certification applications for employers seeking to bring foreign workers into the United States and grants certifications in those cases where employers can demonstrate that there are not sufficient US workers available to perform the work at wages that meet or exceed the wage paid for the occupation in the area of intended employment.
In FY 2016, the OFLC processed 775,979 employer applications for 1,699,957 positions for temporary and permanent labor certifications. This was a nine percent increase in the overall number of processed applications from the previous year. The process of reviewing every case is becoming a tedious task as the number of applicants is increasing every year.
The increasing number of applicants every year calls for a Machine Learning based solution that can help in shortlisting the candidates having higher chances of VISA approval. OFLC has hired your firm EasyVisa for data-driven solutions. You as a data scientist have to analyze the data provided and, with the help of a classification model:
The data contains the different attributes of the employee and the employer. The detailed data dictionary is given below.
#import sys
#!{sys.executable} -m pip install pandas-profiling
#Libraries to help with reading data and manipulating data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Libraries that support data visualization
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
# Libraries to suppress warnings
import warnings
warnings.filterwarnings("ignore")
sns.set()
# to split the data into train and test
from sklearn.model_selection import train_test_split
# to build logistic regression_model
from sklearn.linear_model import LogisticRegression
# to check model performance
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
# to build linear regression_model using statsmodels
import statsmodels.api as sm
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.linear_model import LogisticRegression
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
plot_confusion_matrix,
precision_recall_curve,
roc_curve,
make_scorer,
)
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import sklearn.metrics as metrics
# To tune different models
from sklearn.model_selection import GridSearchCV
# Libraries to suppress warnings
import warnings
warnings.filterwarnings("ignore")
from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier
#To install xgboost library use - !pip install xgboost
from xgboost import XGBClassifier
from sklearn.ensemble import BaggingClassifier, RandomForestClassifier, StackingClassifier
from sklearn.linear_model import LogisticRegression
url = "EasyVisa.csv"
visaData = pd.read_csv(url, index_col = 0)
#creditData = pd.read_csv("credit.csv")
visaData.head(10) #several missing values!
| continent | education_of_employee | has_job_experience | requires_job_training | no_of_employees | yr_of_estab | region_of_employment | prevailing_wage | unit_of_wage | full_time_position | case_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| case_id | |||||||||||
| EZYV01 | Asia | High School | N | N | 14513 | 2007 | West | 592.2029 | Hour | Y | Denied |
| EZYV02 | Asia | Master's | Y | N | 2412 | 2002 | Northeast | 83425.6500 | Year | Y | Certified |
| EZYV03 | Asia | Bachelor's | N | Y | 44444 | 2008 | West | 122996.8600 | Year | Y | Denied |
| EZYV04 | Asia | Bachelor's | N | N | 98 | 1897 | West | 83434.0300 | Year | Y | Denied |
| EZYV05 | Africa | Master's | Y | N | 1082 | 2005 | South | 149907.3900 | Year | Y | Certified |
| EZYV06 | Asia | Master's | Y | N | 2339 | 2012 | South | 78252.1400 | Year | Y | Certified |
| EZYV07 | Asia | Bachelor's | N | N | 4985 | 1994 | South | 53635.3900 | Year | Y | Certified |
| EZYV08 | North America | Bachelor's | Y | N | 3035 | 1924 | West | 418.2298 | Hour | Y | Denied |
| EZYV09 | Asia | Bachelor's | N | N | 4810 | 2012 | Midwest | 74362.1900 | Year | Y | Certified |
| EZYV10 | Europe | Doctorate | Y | N | 2251 | 1995 | South | 67514.7600 | Year | Y | Certified |
#copying data to another data frame to avaoid changes in the original data
data_new = visaData.copy()
data_new.shape
(25480, 11)
# Selecting duplicate rows except first
# occurrence based on all columns
duplicate = data_new[data_new.duplicated()]
duplicate
| continent | education_of_employee | has_job_experience | requires_job_training | no_of_employees | yr_of_estab | region_of_employment | prevailing_wage | unit_of_wage | full_time_position | case_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| case_id |
data_new.info()
<class 'pandas.core.frame.DataFrame'> Index: 25480 entries, EZYV01 to EZYV25480 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 continent 25480 non-null object 1 education_of_employee 25480 non-null object 2 has_job_experience 25480 non-null object 3 requires_job_training 25480 non-null object 4 no_of_employees 25480 non-null int64 5 yr_of_estab 25480 non-null int64 6 region_of_employment 25480 non-null object 7 prevailing_wage 25480 non-null float64 8 unit_of_wage 25480 non-null object 9 full_time_position 25480 non-null object 10 case_status 25480 non-null object dtypes: float64(1), int64(2), object(8) memory usage: 2.3+ MB
data_new.describe().T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| no_of_employees | 25480.0 | 5667.043210 | 22877.928848 | -26.0000 | 1022.00 | 2109.00 | 3504.0000 | 602069.00 |
| yr_of_estab | 25480.0 | 1979.409929 | 42.366929 | 1800.0000 | 1976.00 | 1997.00 | 2005.0000 | 2016.00 |
| prevailing_wage | 25480.0 | 74455.814592 | 52815.942327 | 2.1367 | 34015.48 | 70308.21 | 107735.5125 | 319210.27 |
data_new.describe(exclude='number').T
| count | unique | top | freq | |
|---|---|---|---|---|
| continent | 25480 | 6 | Asia | 16861 |
| education_of_employee | 25480 | 4 | Bachelor's | 10234 |
| has_job_experience | 25480 | 2 | Y | 14802 |
| requires_job_training | 25480 | 2 | N | 22525 |
| region_of_employment | 25480 | 5 | Northeast | 7195 |
| unit_of_wage | 25480 | 4 | Year | 22962 |
| full_time_position | 25480 | 2 | Y | 22773 |
| case_status | 25480 | 2 | Certified | 17018 |
data_new.isna().sum()
continent 0 education_of_employee 0 has_job_experience 0 requires_job_training 0 no_of_employees 0 yr_of_estab 0 region_of_employment 0 prevailing_wage 0 unit_of_wage 0 full_time_position 0 case_status 0 dtype: int64
category = ['continent', 'education_of_employee', 'has_job_experience','requires_job_training', 'region_of_employment',
'unit_of_wage', 'full_time_position', 'case_status']
for column in category:
print(data_new[column].value_counts())
print('_'*40)
Asia 16861 Europe 3732 North America 3292 South America 852 Africa 551 Oceania 192 Name: continent, dtype: int64 ________________________________________ Bachelor's 10234 Master's 9634 High School 3420 Doctorate 2192 Name: education_of_employee, dtype: int64 ________________________________________ Y 14802 N 10678 Name: has_job_experience, dtype: int64 ________________________________________ N 22525 Y 2955 Name: requires_job_training, dtype: int64 ________________________________________ Northeast 7195 South 7017 West 6586 Midwest 4307 Island 375 Name: region_of_employment, dtype: int64 ________________________________________ Year 22962 Hour 2157 Week 272 Month 89 Name: unit_of_wage, dtype: int64 ________________________________________ Y 22773 N 2707 Name: full_time_position, dtype: int64 ________________________________________ Certified 17018 Denied 8462 Name: case_status, dtype: int64 ________________________________________
data_new.head()
| continent | education_of_employee | has_job_experience | requires_job_training | no_of_employees | yr_of_estab | region_of_employment | prevailing_wage | unit_of_wage | full_time_position | case_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| case_id | |||||||||||
| EZYV01 | Asia | High School | N | N | 14513 | 2007 | West | 592.2029 | Hour | Y | Denied |
| EZYV02 | Asia | Master's | Y | N | 2412 | 2002 | Northeast | 83425.6500 | Year | Y | Certified |
| EZYV03 | Asia | Bachelor's | N | Y | 44444 | 2008 | West | 122996.8600 | Year | Y | Denied |
| EZYV04 | Asia | Bachelor's | N | N | 98 | 1897 | West | 83434.0300 | Year | Y | Denied |
| EZYV05 | Africa | Master's | Y | N | 1082 | 2005 | South | 149907.3900 | Year | Y | Certified |
print(data_new.unit_of_wage.unique())
['Hour' 'Year' 'Week' 'Month']
print(data_new.region_of_employment.unique())
['West' 'Northeast' 'South' 'Midwest' 'Island']
print(data_new.education_of_employee.unique())
['High School' "Master's" "Bachelor's" 'Doctorate']
print(data_new.continent.unique())
['Asia' 'Africa' 'North America' 'Europe' 'South America' 'Oceania']
data_new.describe(include='all').T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| continent | 25480 | 6 | Asia | 16861 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| education_of_employee | 25480 | 4 | Bachelor's | 10234 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| has_job_experience | 25480 | 2 | Y | 14802 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| requires_job_training | 25480 | 2 | N | 22525 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| no_of_employees | 25480.0 | NaN | NaN | NaN | 5667.04321 | 22877.928848 | -26.0 | 1022.0 | 2109.0 | 3504.0 | 602069.0 |
| yr_of_estab | 25480.0 | NaN | NaN | NaN | 1979.409929 | 42.366929 | 1800.0 | 1976.0 | 1997.0 | 2005.0 | 2016.0 |
| region_of_employment | 25480 | 5 | Northeast | 7195 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| prevailing_wage | 25480.0 | NaN | NaN | NaN | 74455.814592 | 52815.942327 | 2.1367 | 34015.48 | 70308.21 | 107735.5125 | 319210.27 |
| unit_of_wage | 25480 | 4 | Year | 22962 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| full_time_position | 25480 | 2 | Y | 22773 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| case_status | 25480 | 2 | Certified | 17018 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
data_new.isna().sum()
continent 0 education_of_employee 0 has_job_experience 0 requires_job_training 0 no_of_employees 0 yr_of_estab 0 region_of_employment 0 prevailing_wage 0 unit_of_wage 0 full_time_position 0 case_status 0 dtype: int64
data_new.isnull().sum()
continent 0 education_of_employee 0 has_job_experience 0 requires_job_training 0 no_of_employees 0 yr_of_estab 0 region_of_employment 0 prevailing_wage 0 unit_of_wage 0 full_time_position 0 case_status 0 dtype: int64
['continent', 'education_of_employee', 'region_of_employment', 'unit_of_wage']
['continent', 'education_of_employee', 'region_of_employment', 'unit_of_wage']
data_new.continent = data_new.continent.astype('category')
data_new.education_of_employee = data_new.education_of_employee.astype('category')
data_new.region_of_employment = data_new.region_of_employment.astype('category')
data_new.unit_of_wage = data_new.unit_of_wage.astype('category')
data_new.has_job_experience = data_new.has_job_experience.astype('category')
data_new.requires_job_training = data_new.requires_job_training.astype('category')
data_new.full_time_position = data_new.full_time_position.astype('category')
data_new.case_status = data_new.case_status.astype('category')
data_new.has_job_experience.value_counts(dropna=False)
Y 14802 N 10678 Name: has_job_experience, dtype: int64
data_new.info()
<class 'pandas.core.frame.DataFrame'> Index: 25480 entries, EZYV01 to EZYV25480 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 continent 25480 non-null category 1 education_of_employee 25480 non-null category 2 has_job_experience 25480 non-null category 3 requires_job_training 25480 non-null category 4 no_of_employees 25480 non-null int64 5 yr_of_estab 25480 non-null int64 6 region_of_employment 25480 non-null category 7 prevailing_wage 25480 non-null float64 8 unit_of_wage 25480 non-null category 9 full_time_position 25480 non-null category 10 case_status 25480 non-null category dtypes: category(8), float64(1), int64(2) memory usage: 996.6+ KB
The object variables are converted to categorical variables.
Questions:
Those with higher education may want to travel abroad for a well-paid job. Does education play a role in Visa certification?
How does the visa status vary across different continents?
Experienced professionals might look abroad for opportunities to improve their lifestyles and career development. Does work experience influence visa status?
In the United States, employees are paid at different intervals. Which pay unit is most likely to be certified for a visa?
The US government has established a prevailing wage to protect local talent and foreign workers. How does the visa status change with the prevailing wage?
#copying data to another data frame to avaoid changes in the original data
df = data_new.copy()
sns.histplot(df.no_of_employees, kde = True)
<AxesSubplot:xlabel='no_of_employees', ylabel='Count'>
sns.boxplot(df.no_of_employees,orient = "h");
sns.histplot(df.yr_of_estab, kde = True)
<AxesSubplot:xlabel='yr_of_estab', ylabel='Count'>
sns.boxplot(df.yr_of_estab,orient = "h");
sns.histplot(df.prevailing_wage, kde = True)
<AxesSubplot:xlabel='prevailing_wage', ylabel='Count'>
sns.boxplot(df.prevailing_wage,orient = "h");
def labeled_barplot(data, feature, perc=False, n=None):
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="Paired",
order=data[feature].value_counts().index[:n].sort_values(),
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
)
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
)
plt.show()
labeled_barplot(df, "continent", perc=True)
labeled_barplot(df, "education_of_employee", perc=True)
labeled_barplot(df, "has_job_experience", perc=True)
labeled_barplot(df, "requires_job_training", perc=True)
labeled_barplot(df, "region_of_employment", perc=True)
labeled_barplot(df, "unit_of_wage", perc=True)
labeled_barplot(df, "full_time_position", perc=True)
labeled_barplot(df, "case_status", perc=True)
sns.heatmap(df.corr(),cmap="YlGnBu",vmin=-1,vmax=1, annot=True)
<AxesSubplot:>
sns.pairplot(df);
pd.crosstab(df.continent,df.case_status)
| case_status | Certified | Denied |
|---|---|---|
| continent | ||
| Africa | 397 | 154 |
| Asia | 11012 | 5849 |
| Europe | 2957 | 775 |
| North America | 2037 | 1255 |
| Oceania | 122 | 70 |
| South America | 493 | 359 |
sns.countplot(x="continent", hue="case_status", data=df, palette='Set1',saturation=50 );
pd.crosstab(df.education_of_employee,df.case_status)
| case_status | Certified | Denied |
|---|---|---|
| education_of_employee | ||
| Bachelor's | 6367 | 3867 |
| Doctorate | 1912 | 280 |
| High School | 1164 | 2256 |
| Master's | 7575 | 2059 |
sns.countplot(x="education_of_employee", hue="case_status", data=df, palette='Set1',saturation=50 );
pd.crosstab(df.has_job_experience,df.case_status)
| case_status | Certified | Denied |
|---|---|---|
| has_job_experience | ||
| N | 5994 | 4684 |
| Y | 11024 | 3778 |
sns.countplot(x="has_job_experience", hue="case_status", data=df, palette='Set1',saturation=50 );
pd.crosstab(df.requires_job_training,df.case_status)
| case_status | Certified | Denied |
|---|---|---|
| requires_job_training | ||
| N | 15012 | 7513 |
| Y | 2006 | 949 |
sns.countplot(x="requires_job_training", hue="case_status", data=df, palette='Set1',saturation=50 );
pd.crosstab(df.region_of_employment,df.case_status)
| case_status | Certified | Denied |
|---|---|---|
| region_of_employment | ||
| Island | 226 | 149 |
| Midwest | 3253 | 1054 |
| Northeast | 4526 | 2669 |
| South | 4913 | 2104 |
| West | 4100 | 2486 |
sns.countplot(x="region_of_employment", hue="case_status", data=df, palette='Set1',saturation=50 );
pd.crosstab(df.unit_of_wage,df.case_status)
| case_status | Certified | Denied |
|---|---|---|
| unit_of_wage | ||
| Hour | 747 | 1410 |
| Month | 55 | 34 |
| Week | 169 | 103 |
| Year | 16047 | 6915 |
sns.countplot(x="unit_of_wage", hue="case_status", data=df, palette='Set1',saturation=50 );
pd.crosstab(df.full_time_position,df.case_status)
| case_status | Certified | Denied |
|---|---|---|
| full_time_position | ||
| N | 1855 | 852 |
| Y | 15163 | 7610 |
sns.countplot(x="full_time_position", hue="case_status", data=df, palette='Set1',saturation=50 );
plt.figure(figsize=(15,5))
ax = sns.barplot(x='case_status', y='no_of_employees', data=df)
plt.figure(figsize=(15,5))
ax = sns.barplot(x='case_status', y='prevailing_wage', data=df)
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='no_of_employees', y = df['yr_of_estab'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='yr_of_estab', y = df['prevailing_wage'], data = df)
fig = plt.figure(figsize= (10,5))
ax = sns.scatterplot(x ='no_of_employees', y = df['prevailing_wage'], data = df)
figure = plt.figure(figsize=(8,7))
sns.scatterplot(x="no_of_employees", y="yr_of_estab", data=df, hue='case_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.scatterplot(x="prevailing_wage", y="yr_of_estab", data=df, hue='case_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.scatterplot(x="prevailing_wage", y="no_of_employees", data=df, hue='case_status', palette='tab10' )
plt.show()
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='continent', y = 'no_of_employees', data = df, hue = 'case_status')
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='education_of_employee', y = 'no_of_employees', data = df, hue = 'case_status')
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='has_job_experience', y = 'no_of_employees', data = df, hue = 'case_status')
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='requires_job_training', y = 'no_of_employees', data = df, hue = 'case_status')
fig = plt.figure(figsize= (15,5))
ax = sns.pointplot(x ='requires_job_training', y = 'no_of_employees', data = df, hue = 'case_status')
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='region_of_employment', y = 'no_of_employees', data = df, hue = 'case_status')
fig = plt.figure(figsize= (15,5))
ax = sns.lineplot(x ='region_of_employment', y = 'no_of_employees', data = df, hue = 'case_status')
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='unit_of_wage', y = 'no_of_employees', data = df, hue = 'case_status')
fig = plt.figure(figsize= (15,5))
ax = sns.pointplot(x ='unit_of_wage', y = 'no_of_employees', data = df, hue = 'case_status')
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='full_time_position', y = 'no_of_employees', data = df, hue = 'case_status')
### distribution of plot across each case_status - violin plot
ax = sns.violinplot(x =df.case_status, y = df['yr_of_estab'])
ax = sns.violinplot(x =df.case_status, y = df['no_of_employees'])
There are many Certified applicants than rejected.
labeled_barplot(df, "education_of_employee", perc=True)
sns.countplot(x="education_of_employee", hue="case_status", data=df, palette='Set1',saturation=50 );
sns.countplot(x="continent", hue="case_status", data=df, palette='Set1',saturation=50 );
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='continent', y = 'no_of_employees', data = df, hue = 'case_status')
labeled_barplot(df, "has_job_experience", perc=True)
sns.countplot(x="has_job_experience", hue="case_status", data=df, palette='Set1',saturation=50 );
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='has_job_experience', y = 'no_of_employees', data = df, hue = 'case_status')
Equal number of cases are certified among the applicants with or without job experience. has_job_experience criteria is less dependent on the case_status.
sns.countplot(x="unit_of_wage", hue="case_status", data=df, palette='Set1',saturation=50 );
plt.figure(figsize=(15,5))
ax = sns.barplot(x='case_status', y='prevailing_wage', data=df)
Certified applicants have more prevailing_wages than rejected applicants.
# let's create a copy of the data
df1 = df.copy()
np.random.seed(1)
df1.sample(n=10)#Return a random sample of 10 rows from the dataframe 'data'
| continent | education_of_employee | has_job_experience | requires_job_training | no_of_employees | yr_of_estab | region_of_employment | prevailing_wage | unit_of_wage | full_time_position | case_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| case_id | |||||||||||
| EZYV17640 | Asia | Bachelor's | Y | N | 567 | 1992 | Midwest | 26842.9100 | Year | Y | Certified |
| EZYV23952 | Oceania | Bachelor's | N | N | 619 | 1938 | Midwest | 66419.9800 | Year | Y | Certified |
| EZYV8626 | Asia | Master's | N | N | 2635 | 2005 | South | 887.2921 | Hour | Y | Certified |
| EZYV20207 | Asia | Bachelor's | Y | Y | 3184 | 1986 | Northeast | 49435.8000 | Year | Y | Certified |
| EZYV7472 | Europe | Bachelor's | Y | N | 4681 | 1928 | West | 49865.1900 | Year | Y | Denied |
| EZYV3434 | Asia | Bachelor's | Y | N | 222 | 1989 | South | 813.7261 | Hour | Y | Certified |
| EZYV24441 | Europe | High School | N | Y | 3278 | 1994 | South | 204948.3900 | Year | Y | Denied |
| EZYV12105 | Asia | Master's | Y | N | 1359 | 1997 | West | 202237.0400 | Year | N | Certified |
| EZYV15657 | Asia | Bachelor's | N | N | 2081 | 2003 | West | 111713.0200 | Year | Y | Denied |
| EZYV23111 | North America | Bachelor's | Y | N | 854 | 1998 | Northeast | 444.8257 | Hour | Y | Denied |
df1.isnull().sum().sort_values(ascending = False)
#Return the count of missing values column-wise and sort them in descending order
continent 0 education_of_employee 0 has_job_experience 0 requires_job_training 0 no_of_employees 0 yr_of_estab 0 region_of_employment 0 prevailing_wage 0 unit_of_wage 0 full_time_position 0 case_status 0 dtype: int64
There are no missing values. Hence, there is no need for missing value treatment.
# Selecting duplicate rows except first
# occurrence based on all columns
duplicate_new = df1[df1.duplicated()]
duplicate_new
| continent | education_of_employee | has_job_experience | requires_job_training | no_of_employees | yr_of_estab | region_of_employment | prevailing_wage | unit_of_wage | full_time_position | case_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| case_id |
There are no duplicate rows.
df1
| continent | education_of_employee | has_job_experience | requires_job_training | no_of_employees | yr_of_estab | region_of_employment | prevailing_wage | unit_of_wage | full_time_position | case_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| case_id | |||||||||||
| EZYV01 | Asia | High School | N | N | 14513 | 2007 | West | 592.2029 | Hour | Y | Denied |
| EZYV02 | Asia | Master's | Y | N | 2412 | 2002 | Northeast | 83425.6500 | Year | Y | Certified |
| EZYV03 | Asia | Bachelor's | N | Y | 44444 | 2008 | West | 122996.8600 | Year | Y | Denied |
| EZYV04 | Asia | Bachelor's | N | N | 98 | 1897 | West | 83434.0300 | Year | Y | Denied |
| EZYV05 | Africa | Master's | Y | N | 1082 | 2005 | South | 149907.3900 | Year | Y | Certified |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| EZYV25476 | Asia | Bachelor's | Y | Y | 2601 | 2008 | South | 77092.5700 | Year | Y | Certified |
| EZYV25477 | Asia | High School | Y | N | 3274 | 2006 | Northeast | 279174.7900 | Year | Y | Certified |
| EZYV25478 | Asia | Master's | Y | N | 1121 | 1910 | South | 146298.8500 | Year | N | Certified |
| EZYV25479 | Asia | Master's | Y | Y | 1918 | 1887 | West | 86154.7700 | Year | Y | Certified |
| EZYV25480 | Asia | Bachelor's | Y | N | 3195 | 1960 | Midwest | 70876.9100 | Year | Y | Certified |
25480 rows × 11 columns
# Lets look at the statistical summary of the data
df1.describe(include="all").T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| continent | 25480 | 6 | Asia | 16861 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| education_of_employee | 25480 | 4 | Bachelor's | 10234 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| has_job_experience | 25480 | 2 | Y | 14802 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| requires_job_training | 25480 | 2 | N | 22525 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| no_of_employees | 25480.0 | NaN | NaN | NaN | 5667.04321 | 22877.928848 | -26.0 | 1022.0 | 2109.0 | 3504.0 | 602069.0 |
| yr_of_estab | 25480.0 | NaN | NaN | NaN | 1979.409929 | 42.366929 | 1800.0 | 1976.0 | 1997.0 | 2005.0 | 2016.0 |
| region_of_employment | 25480 | 5 | Northeast | 7195 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| prevailing_wage | 25480.0 | NaN | NaN | NaN | 74455.814592 | 52815.942327 | 2.1367 | 34015.48 | 70308.21 | 107735.5125 | 319210.27 |
| unit_of_wage | 25480 | 4 | Year | 22962 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| full_time_position | 25480 | 2 | Y | 22773 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| case_status | 25480 | 2 | Certified | 17018 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
category = ['continent', 'education_of_employee', 'has_job_experience','requires_job_training', 'region_of_employment',
'unit_of_wage', 'full_time_position', 'case_status']
for column in category:
print(df1[column].value_counts())
print('_'*40)
Asia 16861 Europe 3732 North America 3292 South America 852 Africa 551 Oceania 192 Name: continent, dtype: int64 ________________________________________ Bachelor's 10234 Master's 9634 High School 3420 Doctorate 2192 Name: education_of_employee, dtype: int64 ________________________________________ Y 14802 N 10678 Name: has_job_experience, dtype: int64 ________________________________________ N 22525 Y 2955 Name: requires_job_training, dtype: int64 ________________________________________ Northeast 7195 South 7017 West 6586 Midwest 4307 Island 375 Name: region_of_employment, dtype: int64 ________________________________________ Year 22962 Hour 2157 Week 272 Month 89 Name: unit_of_wage, dtype: int64 ________________________________________ Y 22773 N 2707 Name: full_time_position, dtype: int64 ________________________________________ Certified 17018 Denied 8462 Name: case_status, dtype: int64 ________________________________________
df1.info()
<class 'pandas.core.frame.DataFrame'> Index: 25480 entries, EZYV01 to EZYV25480 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 continent 25480 non-null category 1 education_of_employee 25480 non-null category 2 has_job_experience 25480 non-null category 3 requires_job_training 25480 non-null category 4 no_of_employees 25480 non-null int64 5 yr_of_estab 25480 non-null int64 6 region_of_employment 25480 non-null category 7 prevailing_wage 25480 non-null float64 8 unit_of_wage 25480 non-null category 9 full_time_position 25480 non-null category 10 case_status 25480 non-null category dtypes: category(8), float64(1), int64(2) memory usage: 2.0+ MB
df1 = df.copy()
df1.info()
<class 'pandas.core.frame.DataFrame'> Index: 25480 entries, EZYV01 to EZYV25480 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 continent 25480 non-null category 1 education_of_employee 25480 non-null category 2 has_job_experience 25480 non-null category 3 requires_job_training 25480 non-null category 4 no_of_employees 25480 non-null int64 5 yr_of_estab 25480 non-null int64 6 region_of_employment 25480 non-null category 7 prevailing_wage 25480 non-null float64 8 unit_of_wage 25480 non-null category 9 full_time_position 25480 non-null category 10 case_status 25480 non-null category dtypes: category(8), float64(1), int64(2) memory usage: 2.0+ MB
def use_floor_function(num):
if num < 0:
num = abs(num)
return num
neg_var_col = ['no_of_employees']
for col_name in neg_var_col:
df1[col_name] = df1[col_name].apply(use_floor_function)
df1.no_of_employees
case_id
EZYV01 14513
EZYV02 2412
EZYV03 44444
EZYV04 98
EZYV05 1082
...
EZYV25476 2601
EZYV25477 3274
EZYV25478 1121
EZYV25479 1918
EZYV25480 3195
Name: no_of_employees, Length: 25480, dtype: int64
# Lets look at the statistical summary of the data
df1.describe(include="all").T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| continent | 25480 | 6 | Asia | 16861 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| education_of_employee | 25480 | 4 | Bachelor's | 10234 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| has_job_experience | 25480 | 2 | Y | 14802 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| requires_job_training | 25480 | 2 | N | 22525 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| no_of_employees | 25480.0 | NaN | NaN | NaN | 5667.089207 | 22877.917453 | 11.0 | 1022.0 | 2109.0 | 3504.0 | 602069.0 |
| yr_of_estab | 25480.0 | NaN | NaN | NaN | 1979.409929 | 42.366929 | 1800.0 | 1976.0 | 1997.0 | 2005.0 | 2016.0 |
| region_of_employment | 25480 | 5 | Northeast | 7195 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| prevailing_wage | 25480.0 | NaN | NaN | NaN | 74455.814592 | 52815.942327 | 2.1367 | 34015.48 | 70308.21 | 107735.5125 | 319210.27 |
| unit_of_wage | 25480 | 4 | Year | 22962 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| full_time_position | 25480 | 2 | Y | 22773 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| case_status | 25480 | 2 | Certified | 17018 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
df1.loc[(df1['unit_of_wage'] == 'Hour'),"prevailing_wage"] = df1[df1['unit_of_wage'] == 'Hour']["prevailing_wage"] * 40*4*12
df1.loc[(df1['unit_of_wage'] == 'Week'),"prevailing_wage"] = df1[df1['unit_of_wage'] == 'Week']["prevailing_wage"] * 4*12
df1.loc[(df1['unit_of_wage'] == 'Month'),"prevailing_wage"] = df1[df1['unit_of_wage'] == 'Month']["prevailing_wage"] *12
df1[['prevailing_wage','unit_of_wage']]
| prevailing_wage | unit_of_wage | |
|---|---|---|
| case_id | ||
| EZYV01 | 1137029.568 | Hour |
| EZYV02 | 83425.650 | Year |
| EZYV03 | 122996.860 | Year |
| EZYV04 | 83434.030 | Year |
| EZYV05 | 149907.390 | Year |
| ... | ... | ... |
| EZYV25476 | 77092.570 | Year |
| EZYV25477 | 279174.790 | Year |
| EZYV25478 | 146298.850 | Year |
| EZYV25479 | 86154.770 | Year |
| EZYV25480 | 70876.910 | Year |
25480 rows × 2 columns
df[['prevailing_wage','unit_of_wage']]
| prevailing_wage | unit_of_wage | |
|---|---|---|
| case_id | ||
| EZYV01 | 592.2029 | Hour |
| EZYV02 | 83425.6500 | Year |
| EZYV03 | 122996.8600 | Year |
| EZYV04 | 83434.0300 | Year |
| EZYV05 | 149907.3900 | Year |
| ... | ... | ... |
| EZYV25476 | 77092.5700 | Year |
| EZYV25477 | 279174.7900 | Year |
| EZYV25478 | 146298.8500 | Year |
| EZYV25479 | 86154.7700 | Year |
| EZYV25480 | 70876.9100 | Year |
25480 rows × 2 columns
All the prevailing wages are converted to one unit. Keeping the unit_of_wages column same.
df1
| continent | education_of_employee | has_job_experience | requires_job_training | no_of_employees | yr_of_estab | region_of_employment | prevailing_wage | unit_of_wage | full_time_position | case_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| case_id | |||||||||||
| EZYV01 | Asia | High School | N | N | 14513 | 2007 | West | 1137029.568 | Hour | Y | Denied |
| EZYV02 | Asia | Master's | Y | N | 2412 | 2002 | Northeast | 83425.650 | Year | Y | Certified |
| EZYV03 | Asia | Bachelor's | N | Y | 44444 | 2008 | West | 122996.860 | Year | Y | Denied |
| EZYV04 | Asia | Bachelor's | N | N | 98 | 1897 | West | 83434.030 | Year | Y | Denied |
| EZYV05 | Africa | Master's | Y | N | 1082 | 2005 | South | 149907.390 | Year | Y | Certified |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| EZYV25476 | Asia | Bachelor's | Y | Y | 2601 | 2008 | South | 77092.570 | Year | Y | Certified |
| EZYV25477 | Asia | High School | Y | N | 3274 | 2006 | Northeast | 279174.790 | Year | Y | Certified |
| EZYV25478 | Asia | Master's | Y | N | 1121 | 1910 | South | 146298.850 | Year | N | Certified |
| EZYV25479 | Asia | Master's | Y | Y | 1918 | 1887 | West | 86154.770 | Year | Y | Certified |
| EZYV25480 | Asia | Bachelor's | Y | N | 3195 | 1960 | Midwest | 70876.910 | Year | Y | Certified |
25480 rows × 11 columns
# outlier detection using boxplot
numeric_columns = df1.select_dtypes(include=np.number).columns.tolist()
#let's plot the boxplots of all columns to check for outliers
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numeric_columns):
plt.subplot(5, 4, i + 1)
plt.boxplot(df1[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
# Outlier treatment
def treatment_outliers(df1, col):
Q1 = df1[col].quantile(0.25) # 25th quantile
Q3 = df1[col].quantile(0.75) # 75th quantile
IQR = Q3 - Q1
Lower_Whisker = Q1 - 1.5 * IQR
Upper_Whisker = Q3 + 1.5 * IQR
# all the values smaller than Lower_Whisker will be assigned the value of Lower_Whisker
# all the values greater than Upper_Whisker will be assigned the value of Upper_Whisker
df1[col] = np.clip(df1[col], Lower_Whisker, Upper_Whisker)
return df1
def treat_all_outliers(df1, col_list):
for c in col_list:
df1 = treatment_outliers(df1, c)
return df1
# treating the outliers
numerical_col = df1.select_dtypes(include=np.number).columns.tolist()
df1 = treat_all_outliers(df1, numerical_col)
# let's look at the boxplots to see if the outliers have been treated or not
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numeric_columns):
plt.subplot(5, 4, i + 1)
plt.boxplot(df1[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
# Lets look at the statistical summary of the data
df1.describe(include="all").T
| count | unique | top | freq | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| continent | 25480 | 6 | Asia | 16861 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| education_of_employee | 25480 | 4 | Bachelor's | 10234 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| has_job_experience | 25480 | 2 | Y | 14802 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| requires_job_training | 25480 | 2 | N | 22525 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| no_of_employees | 25480.0 | NaN | NaN | NaN | 2495.034929 | 1874.429628 | 11.0 | 1022.0 | 2109.0 | 3504.0 | 7227.0 |
| yr_of_estab | 25480.0 | NaN | NaN | NaN | 1985.957143 | 25.813205 | 1932.5 | 1976.0 | 1997.0 | 2005.0 | 2016.0 |
| region_of_employment | 25480 | 5 | Northeast | 7195 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| prevailing_wage | 25480.0 | NaN | NaN | NaN | 95533.802286 | 65328.842508 | 100.0 | 47092.79 | 82801.61 | 124783.27 | 241318.99 |
| unit_of_wage | 25480 | 4 | Year | 22962 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| full_time_position | 25480 | 2 | Y | 22773 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
| case_status | 25480 | 2 | Certified | 17018 | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
df1["case_status"].replace({"Certified": 1, "Denied": 0}, inplace=True)
df1["full_time_position"].replace({"Y": 1, "N": 0}, inplace=True)
df1["requires_job_training"].replace({"Y": 1, "N": 0}, inplace=True)
df1["has_job_experience"].replace({"Y": 1, "N": 0}, inplace=True)
# outlier detection using boxplot
numeric_columns = df1.select_dtypes(include=np.number).columns.tolist()
#let's plot the boxplots of all columns to check for outliers
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numeric_columns):
plt.subplot(5, 4, i + 1)
plt.boxplot(df1[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
df1
| continent | education_of_employee | has_job_experience | requires_job_training | no_of_employees | yr_of_estab | region_of_employment | prevailing_wage | unit_of_wage | full_time_position | case_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| case_id | |||||||||||
| EZYV01 | Asia | High School | 0 | 0 | 7227 | 2007.0 | West | 241318.99 | Hour | 1 | 0 |
| EZYV02 | Asia | Master's | 1 | 0 | 2412 | 2002.0 | Northeast | 83425.65 | Year | 1 | 1 |
| EZYV03 | Asia | Bachelor's | 0 | 1 | 7227 | 2008.0 | West | 122996.86 | Year | 1 | 0 |
| EZYV04 | Asia | Bachelor's | 0 | 0 | 98 | 1932.5 | West | 83434.03 | Year | 1 | 0 |
| EZYV05 | Africa | Master's | 1 | 0 | 1082 | 2005.0 | South | 149907.39 | Year | 1 | 1 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| EZYV25476 | Asia | Bachelor's | 1 | 1 | 2601 | 2008.0 | South | 77092.57 | Year | 1 | 1 |
| EZYV25477 | Asia | High School | 1 | 0 | 3274 | 2006.0 | Northeast | 241318.99 | Year | 1 | 1 |
| EZYV25478 | Asia | Master's | 1 | 0 | 1121 | 1932.5 | South | 146298.85 | Year | 0 | 1 |
| EZYV25479 | Asia | Master's | 1 | 1 | 1918 | 1932.5 | West | 86154.77 | Year | 1 | 1 |
| EZYV25480 | Asia | Bachelor's | 1 | 0 | 3195 | 1960.0 | Midwest | 70876.91 | Year | 1 | 1 |
25480 rows × 11 columns
df1.info()
<class 'pandas.core.frame.DataFrame'> Index: 25480 entries, EZYV01 to EZYV25480 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 continent 25480 non-null category 1 education_of_employee 25480 non-null category 2 has_job_experience 25480 non-null int64 3 requires_job_training 25480 non-null int64 4 no_of_employees 25480 non-null int64 5 yr_of_estab 25480 non-null float64 6 region_of_employment 25480 non-null category 7 prevailing_wage 25480 non-null float64 8 unit_of_wage 25480 non-null category 9 full_time_position 25480 non-null int64 10 case_status 25480 non-null int64 dtypes: category(4), float64(2), int64(5) memory usage: 2.7+ MB
sns.histplot(df1.no_of_employees, kde = True)
<AxesSubplot:xlabel='no_of_employees', ylabel='Count'>
sns.boxplot(df1.no_of_employees,orient = "h");
sns.histplot(df1.yr_of_estab, kde = True)
<AxesSubplot:xlabel='yr_of_estab', ylabel='Count'>
sns.boxplot(df1.yr_of_estab,orient = "h");
labeled_barplot(df1, "continent", perc=True)
Most of the employees are from Asia. Very less number of employees are from Oceania.
labeled_barplot(df1, "education_of_employee", perc=True)
40.2% of the employees have Bachelors degree. 37.8% of the employees have Master's degree. Very less number of employees(8.6%) of them have Doctor's degree.
labeled_barplot(df1, "has_job_experience", perc=True)
58.1% of the applicants have job experience.
labeled_barplot(df1, "requires_job_training", perc=True)
88.4% of the applicants donot require job training
labeled_barplot(df1, "region_of_employment", perc=True)
Very less(1.5%) of the applicants are from Island. Most of the employees are from Northeast, South and West regions.
labeled_barplot(df1, "unit_of_wage", perc=True)
90.1% of the employees receive year wages. 8.5% of the employees receive Hour wages. 0.3% of the employees receive Month wages. 1.1% of the employees receive Week wages.
labeled_barplot(df1, "full_time_position", perc=True)
89.4% of the employees are in full_time_position. Rest are not.
labeled_barplot(df1, "case_status", perc=True)
66.8% of the cases are Certified.
sns.heatmap(df1.corr(),cmap="YlGnBu",vmin=-1,vmax=1, annot=True)
<AxesSubplot:>
sns.pairplot(df1);
Highest correlation from the above heatmap is case_status and has_job_experience followed by requires_job_training and full_time_position. It is important to note that correlation does not imply causation. Some are negatively correlated.
sns.countplot(x="continent", hue="case_status", data=df1, palette='Set1',saturation=50 );
Most of the applicants whose case status is certified are from Asia. Most of the applicants whose case status is Denied are also from Asia. Least rejections are for those aplicants from Oceania.
sns.countplot(x="education_of_employee", hue="case_status", data=df1, palette='Set1',saturation=50 );
Most of the applicants whose case status is certified have Master's forllowed by candidates with Bachelor's degree. Rejections are very less among Doctorate candidates.
sns.countplot(x="has_job_experience", hue="case_status", data=df1, palette='Set1',saturation=50 );
Most of the applicants whose case_status is certified have job experience. There are more rejections among applicants whose case_status is denied.
sns.countplot(x="requires_job_training", hue="case_status", data=df1, palette='Set1',saturation=50 );
Most of the candidates whose case is certified are those who did not require job training. Very less rejections are among those who required job training.
sns.countplot(x="region_of_employment", hue="case_status", data=df1, palette='Set1',saturation=50 );
Most of the certified applicantions are from south region followed by Northeast and West. Very less rejected applications are from Island.
sns.countplot(x="unit_of_wage", hue="case_status", data=df1, palette='Set1',saturation=50 );
Most of the employees receive yearly wages. The maximum cases are Certified among Yearly waged applicants. Least rejections are among Monthly waged and weekly waged applicants.
sns.countplot(x="full_time_position", hue="case_status", data=df1, palette='Set1',saturation=50 );
Most of the certified applicants have full_time_position. Very less rejected candidates have full_time_position.
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='no_of_employees', y = df1['yr_of_estab'], data = df1)
Companies established between 2000 and 2020 have higher employee count.
fig = plt.figure(figsize= (10,5))
ax = sns.regplot(x ='yr_of_estab', y = df1['prevailing_wage'], data = df1)
High Prevailing wages are among companies established between 1975 and 2000.
fig = plt.figure(figsize= (10,5))
ax = sns.scatterplot(x ='no_of_employees', y = df1['prevailing_wage'], data = df1)
figure = plt.figure(figsize=(8,7))
sns.scatterplot(x="no_of_employees", y="yr_of_estab", data=df1, hue='case_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.scatterplot(x="prevailing_wage", y="yr_of_estab", data=df1, hue='case_status', palette='tab10' )
plt.show()
figure = plt.figure(figsize=(8,7))
sns.scatterplot(x="prevailing_wage", y="no_of_employees", data=df1, hue='case_status', palette='tab10' )
plt.show()
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='continent', y = 'no_of_employees', data = df1, hue = 'case_status')
After outlier treatment, there is a change in the graph, The employee belonging to the companies with higher employee count and region - Africa has highest acceptance. The employee belonging to the companies with less employee count and region - Oceania has highest rejections.
labeled_barplot(df1, "education_of_employee", perc=True)
sns.countplot(x="education_of_employee", hue="case_status", data=df1, palette='Set1',saturation=50 );
In the given dataset, there are more applicants with Bachelor's degree followed by Master's degree. More cases are certified among the employees who have Master's degree followed by Bachelor's degree. Least rejections are among the employees who have Doctorate degree.
sns.countplot(x="continent", hue="case_status", data=df1, palette='Set1',saturation=50 );
More cases are certified among the employees who belong to Asia. Least cases are certified among the employees who belong to Oceania. Less rejections are made among the employees who belong to Africa followed by South America.
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='continent', y = 'no_of_employees', data = df1, hue = 'case_status')
labeled_barplot(df1, "has_job_experience", perc=True)
sns.countplot(x="has_job_experience", hue="case_status", data=df1, palette='Set1',saturation=50 );
From the given dataset, there are more employees who have job experience. More cases are certified among the employees who have job exoerience. More rejections are among the employees who donot have job experience.
fig = plt.figure(figsize= (15,5))
ax = sns.barplot(x ='has_job_experience', y = 'no_of_employees', data = df1, hue = 'case_status')
Aplicants from companies with more employee count and has job experience have higher chances of being accepted.
sns.countplot(x="unit_of_wage", hue="case_status", data=df1, palette='Set1',saturation=50 );
Most of the employees receive yearly wages. The maximum cases are Certified among Yearly waged applicants. Least rejections are among Monthly waged and weekly waged applicants.
plt.figure(figsize=(15,5))
ax = sns.barplot(x='case_status', y='prevailing_wage', data=df1)
Certified applicants have more prevailing_wages than rejected applicants.
X = df1.drop(['case_status'], axis = 1)
y = df1['case_status']
X.head()
| continent | education_of_employee | has_job_experience | requires_job_training | no_of_employees | yr_of_estab | region_of_employment | prevailing_wage | unit_of_wage | full_time_position | |
|---|---|---|---|---|---|---|---|---|---|---|
| case_id | ||||||||||
| EZYV01 | Asia | High School | 0 | 0 | 7227 | 2007.0 | West | 241318.99 | Hour | 1 |
| EZYV02 | Asia | Master's | 1 | 0 | 2412 | 2002.0 | Northeast | 83425.65 | Year | 1 |
| EZYV03 | Asia | Bachelor's | 0 | 1 | 7227 | 2008.0 | West | 122996.86 | Year | 1 |
| EZYV04 | Asia | Bachelor's | 0 | 0 | 98 | 1932.5 | West | 83434.03 | Year | 1 |
| EZYV05 | Africa | Master's | 1 | 0 | 1082 | 2005.0 | South | 149907.39 | Year | 1 |
y.head()
case_id EZYV01 0 EZYV02 1 EZYV03 0 EZYV04 0 EZYV05 1 Name: case_status, dtype: int64
X = pd.get_dummies(X, columns = X.select_dtypes(include = ["object","category"]).columns.tolist(), drop_first = True)
X.head()
| has_job_experience | requires_job_training | no_of_employees | yr_of_estab | prevailing_wage | full_time_position | continent_Asia | continent_Europe | continent_North America | continent_Oceania | continent_South America | education_of_employee_Doctorate | education_of_employee_High School | education_of_employee_Master's | region_of_employment_Midwest | region_of_employment_Northeast | region_of_employment_South | region_of_employment_West | unit_of_wage_Month | unit_of_wage_Week | unit_of_wage_Year | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| case_id | |||||||||||||||||||||
| EZYV01 | 0 | 0 | 7227 | 2007.0 | 241318.99 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| EZYV02 | 1 | 0 | 2412 | 2002.0 | 83425.65 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 |
| EZYV03 | 0 | 1 | 7227 | 2008.0 | 122996.86 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| EZYV04 | 0 | 0 | 98 | 1932.5 | 83434.03 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 |
| EZYV05 | 1 | 0 | 1082 | 2005.0 | 149907.39 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 |
y.head()
case_id EZYV01 0 EZYV02 1 EZYV03 0 EZYV04 0 EZYV05 1 Name: case_status, dtype: int64
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.30, random_state=1,stratify=y)
from statsmodels.stats.outliers_influence import variance_inflation_factor
def checking_vif(predictors):
vif = pd.DataFrame()
vif["feature"] = predictors.columns
vif["VIF"] = [variance_inflation_factor(predictors.values,i)
for i in range(len(predictors.columns))
]
return vif
Both the cases are important.
## Function to create confusion matrix
def make_confusion_matrix_for_model(model,y_actual,labels=[1, 0]):
y_predict = model.predict(X_test)
cm=metrics.confusion_matrix( y_actual, y_predict, labels=[0, 1])
df_cm = pd.DataFrame(cm, index = [i for i in ["Actual - No","Actual - Yes"]],
columns = [i for i in ['Predicted - No','Predicted - Yes']])
group_counts = ["{0:0.0f}".format(value) for value in
cm.flatten()]
group_percentages = ["{0:.2%}".format(value) for value in
cm.flatten()/np.sum(cm)]
labels = [f"{v1}\n{v2}" for v1, v2 in
zip(group_counts,group_percentages)]
labels = np.asarray(labels).reshape(2,2)
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=labels,fmt='')
plt.ylabel('True label')
plt.xlabel('Predicted label')
## Function to calculate different metric scores of the model - Accuracy, Recall and Precision
def get_metrics_score(model,flag=True):
'''
model : classifier to predict values of X
'''
# defining an empty list to store train and test results
score_list=[]
#Predicting on train and tests
pred_train = model.predict(X_train)
pred_test = model.predict(X_test)
#Accuracy of the model
train_acc = model.score(X_train,y_train)
test_acc = model.score(X_test,y_test)
#Recall of the model
train_recall = metrics.recall_score(y_train,pred_train)
test_recall = metrics.recall_score(y_test,pred_test)
#Precision of the model
train_precision = metrics.precision_score(y_train,pred_train)
test_precision = metrics.precision_score(y_test,pred_test)
#f-score of the model
train_fscore = metrics.f1_score(y_train,pred_train)
test_fscore = metrics.f1_score(y_test,pred_test)
score_list.extend((train_acc,test_acc,train_recall,test_recall,train_precision,test_precision,train_fscore,test_fscore))
# If the flag is set to True then only the following print statements will be dispayed. The default value is set to True.
if flag == True:
print("Accuracy on training set : ",model.score(X_train,y_train))
print("Accuracy on test set : ",model.score(X_test,y_test))
print("Recall on training set : ",metrics.recall_score(y_train,pred_train))
print("Recall on test set : ",metrics.recall_score(y_test,pred_test))
print("Precision on training set : ",metrics.precision_score(y_train,pred_train))
print("Precision on test set : ",metrics.precision_score(y_test,pred_test))
print("f1-score on training set : ",metrics.f1_score(y_train,pred_train))
print("f1-score on test set : ",metrics.f1_score(y_test,pred_test))
return score_list # returning the list with train and test scores
dTree = DecisionTreeClassifier(criterion = 'gini', random_state=1)
dTree.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)
make_confusion_matrix_for_model(dTree,y_test)
#Using above defined function to get accuracy, recall and precision on train and test set
dTree_score=get_metrics_score(dTree)
Accuracy on training set : 1.0 Accuracy on test set : 0.6593406593406593 Recall on training set : 1.0 Recall on test set : 0.7380999020568071 Precision on training set : 1.0 Precision on test set : 0.7483614697120159 f1-score on training set : 1.0 f1-score on test set : 0.7431952662721892
feature_names = list(X.columns)
print(feature_names)
['has_job_experience', 'requires_job_training', 'no_of_employees', 'yr_of_estab', 'prevailing_wage', 'full_time_position', 'continent_Asia', 'continent_Europe', 'continent_North America', 'continent_Oceania', 'continent_South America', 'education_of_employee_Doctorate', 'education_of_employee_High School', "education_of_employee_Master's", 'region_of_employment_Midwest', 'region_of_employment_Northeast', 'region_of_employment_South', 'region_of_employment_West', 'unit_of_wage_Month', 'unit_of_wage_Week', 'unit_of_wage_Year']
# Text report showing the rules of a decision tree -
print(tree.export_text(dTree,feature_names=feature_names,show_weights=True))
|--- education_of_employee_High School <= 0.50 | |--- has_job_experience <= 0.50 | | |--- unit_of_wage_Year <= 0.50 | | | |--- education_of_employee_Doctorate <= 0.50 | | | | |--- education_of_employee_Master's <= 0.50 | | | | | |--- unit_of_wage_Week <= 0.50 | | | | | | |--- requires_job_training <= 0.50 | | | | | | | |--- unit_of_wage_Month <= 0.50 | | | | | | | | |--- no_of_employees <= 256.00 | | | | | | | | | |--- no_of_employees <= 26.00 | | | | | | | | | | |--- continent_North America <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- continent_North America > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_employees > 26.00 | | | | | | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- no_of_employees > 256.00 | | | | | | | | | |--- no_of_employees <= 349.50 | | | | | | | | | | |--- no_of_employees <= 321.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_employees > 321.00 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- no_of_employees > 349.50 | | | | | | | | | | |--- yr_of_estab <= 2003.50 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | | | | |--- yr_of_estab > 2003.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | |--- unit_of_wage_Month > 0.50 | | | | | | | | |--- no_of_employees <= 1500.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- no_of_employees > 1500.50 | | | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | | | |--- yr_of_estab <= 2008.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- yr_of_estab > 2008.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- requires_job_training > 0.50 | | | | | | | |--- no_of_employees <= 2458.50 | | | | | | | | |--- unit_of_wage_Month <= 0.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | |--- unit_of_wage_Month > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 2458.50 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | |--- unit_of_wage_Week > 0.50 | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | |--- prevailing_wage <= 173057.01 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- prevailing_wage > 173057.01 | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | |--- no_of_employees <= 5544.50 | | | | | | | | | | |--- yr_of_estab <= 2013.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- yr_of_estab > 2013.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_employees > 5544.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | |--- no_of_employees <= 2983.00 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | |--- no_of_employees > 2983.00 | | | | | | | | |--- no_of_employees <= 3407.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 3407.00 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | |--- education_of_employee_Master's > 0.50 | | | | | |--- continent_North America <= 0.50 | | | | | | |--- unit_of_wage_Week <= 0.50 | | | | | | | |--- yr_of_estab <= 1996.50 | | | | | | | | |--- prevailing_wage <= 206545.16 | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 206545.16 | | | | | | | | | |--- no_of_employees <= 4647.50 | | | | | | | | | | |--- no_of_employees <= 3892.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- no_of_employees > 3892.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_employees > 4647.50 | | | | | | | | | | |--- continent_Oceania <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- continent_Oceania > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- yr_of_estab > 1996.50 | | | | | | | | |--- yr_of_estab <= 1997.50 | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 1997.50 | | | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | | | |--- yr_of_estab <= 2007.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- yr_of_estab > 2007.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | | | |--- no_of_employees <= 771.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_employees > 771.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | |--- unit_of_wage_Week > 0.50 | | | | | | | |--- no_of_employees <= 664.50 | | | | | | | | |--- no_of_employees <= 416.00 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 416.00 | | | | | | | | | |--- region_of_employment_West <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- region_of_employment_West > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 664.50 | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | |--- continent_North America > 0.50 | | | | | | |--- prevailing_wage <= 10229.09 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- prevailing_wage > 10229.09 | | | | | | | |--- region_of_employment_West <= 0.50 | | | | | | | | |--- yr_of_estab <= 2010.50 | | | | | | | | | |--- no_of_employees <= 823.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 823.50 | | | | | | | | | | |--- no_of_employees <= 916.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_employees > 916.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- yr_of_estab > 2010.50 | | | | | | | | | |--- yr_of_estab <= 2012.50 | | | | | | | | | | |--- no_of_employees <= 1496.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_employees > 1496.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- yr_of_estab > 2012.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- region_of_employment_West > 0.50 | | | | | | | | |--- no_of_employees <= 1936.00 | | | | | | | | | |--- no_of_employees <= 795.00 | | | | | | | | | | |--- no_of_employees <= 387.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_employees > 387.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 795.00 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- no_of_employees > 1936.00 | | | | | | | | | |--- yr_of_estab <= 2006.00 | | | | | | | | | | |--- no_of_employees <= 3274.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_employees > 3274.50 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- yr_of_estab > 2006.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | |--- education_of_employee_Doctorate > 0.50 | | | | |--- region_of_employment_West <= 0.50 | | | | | |--- yr_of_estab <= 1955.00 | | | | | | |--- no_of_employees <= 2925.00 | | | | | | | |--- continent_Europe <= 0.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- continent_Europe > 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_employees > 2925.00 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- yr_of_estab > 1955.00 | | | | | | |--- continent_Asia <= 0.50 | | | | | | | |--- yr_of_estab <= 1984.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- yr_of_estab > 1984.50 | | | | | | | | |--- no_of_employees <= 653.50 | | | | | | | | | |--- continent_North America <= 0.50 | | | | | | | | | | |--- no_of_employees <= 317.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_employees > 317.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- continent_North America > 0.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 653.50 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- continent_Asia > 0.50 | | | | | | | |--- yr_of_estab <= 1994.50 | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | |--- yr_of_estab > 1994.50 | | | | | | | | |--- yr_of_estab <= 1996.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 1996.00 | | | | | | | | | |--- yr_of_estab <= 2000.50 | | | | | | | | | | |--- no_of_employees <= 999.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_employees > 999.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- yr_of_estab > 2000.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | |--- region_of_employment_West > 0.50 | | | | | |--- yr_of_estab <= 1970.00 | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | |--- yr_of_estab > 1970.00 | | | | | | |--- no_of_employees <= 1633.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_employees > 1633.00 | | | | | | | |--- unit_of_wage_Week <= 0.50 | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | |--- unit_of_wage_Week > 0.50 | | | | | | | | |--- no_of_employees <= 4036.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_employees > 4036.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | |--- unit_of_wage_Year > 0.50 | | | |--- continent_Europe <= 0.50 | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | |--- region_of_employment_South <= 0.50 | | | | | | |--- education_of_employee_Doctorate <= 0.50 | | | | | | | |--- education_of_employee_Master's <= 0.50 | | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | | |--- weights: [95.00, 0.00] class: 0 | | | | | | | | |--- full_time_position > 0.50 | | | | | | | | | |--- yr_of_estab <= 2004.50 | | | | | | | | | | |--- prevailing_wage <= 217638.20 | | | | | | | | | | | |--- truncated branch of depth 26 | | | | | | | | | | |--- prevailing_wage > 217638.20 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- yr_of_estab > 2004.50 | | | | | | | | | | |--- prevailing_wage <= 147922.91 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | | | |--- prevailing_wage > 147922.91 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- education_of_employee_Master's > 0.50 | | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | | |--- prevailing_wage <= 231944.81 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- prevailing_wage > 231944.81 | | | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- full_time_position > 0.50 | | | | | | | | | |--- prevailing_wage <= 56476.72 | | | | | | | | | | |--- no_of_employees <= 368.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_employees > 368.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- prevailing_wage > 56476.72 | | | | | | | | | | |--- prevailing_wage <= 64817.39 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- prevailing_wage > 64817.39 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | |--- education_of_employee_Doctorate > 0.50 | | | | | | | |--- no_of_employees <= 84.00 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 84.00 | | | | | | | | |--- prevailing_wage <= 61070.79 | | | | | | | | | |--- prevailing_wage <= 58162.18 | | | | | | | | | | |--- no_of_employees <= 439.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_employees > 439.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- prevailing_wage > 58162.18 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 61070.79 | | | | | | | | | |--- no_of_employees <= 1011.00 | | | | | | | | | | |--- no_of_employees <= 889.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_employees > 889.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_employees > 1011.00 | | | | | | | | | | |--- no_of_employees <= 3054.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_employees > 3054.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | |--- region_of_employment_South > 0.50 | | | | | | |--- requires_job_training <= 0.50 | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | |--- no_of_employees <= 1687.50 | | | | | | | | | |--- yr_of_estab <= 2011.00 | | | | | | | | | | |--- yr_of_estab <= 1932.75 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- yr_of_estab > 1932.75 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- yr_of_estab > 2011.00 | | | | | | | | | | |--- no_of_employees <= 580.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_employees > 580.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_employees > 1687.50 | | | | | | | | | |--- no_of_employees <= 3372.00 | | | | | | | | | | |--- prevailing_wage <= 168822.02 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- prevailing_wage > 168822.02 | | | | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | | | | |--- no_of_employees > 3372.00 | | | | | | | | | | |--- no_of_employees <= 6703.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_employees > 6703.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- full_time_position > 0.50 | | | | | | | | |--- no_of_employees <= 3438.00 | | | | | | | | | |--- no_of_employees <= 3341.00 | | | | | | | | | | |--- prevailing_wage <= 129022.18 | | | | | | | | | | | |--- truncated branch of depth 22 | | | | | | | | | | |--- prevailing_wage > 129022.18 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- no_of_employees > 3341.00 | | | | | | | | | | |--- prevailing_wage <= 26857.03 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 26857.03 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- no_of_employees > 3438.00 | | | | | | | | | |--- yr_of_estab <= 2002.50 | | | | | | | | | | |--- continent_North America <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- continent_North America > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- yr_of_estab > 2002.50 | | | | | | | | | | |--- yr_of_estab <= 2006.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- yr_of_estab > 2006.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | |--- requires_job_training > 0.50 | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- full_time_position > 0.50 | | | | | | | | |--- education_of_employee_Doctorate <= 0.50 | | | | | | | | | |--- prevailing_wage <= 41991.57 | | | | | | | | | | |--- yr_of_estab <= 1996.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- yr_of_estab > 1996.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- prevailing_wage > 41991.57 | | | | | | | | | | |--- no_of_employees <= 25.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_employees > 25.00 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | |--- education_of_employee_Doctorate > 0.50 | | | | | | | | | |--- no_of_employees <= 93.50 | | | | | | | | | | |--- prevailing_wage <= 96273.66 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 96273.66 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 93.50 | | | | | | | | | | |--- prevailing_wage <= 109519.62 | | | | | | | | | | | |--- weights: [0.00, 29.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 109519.62 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | |--- region_of_employment_Midwest > 0.50 | | | | | |--- yr_of_estab <= 2004.50 | | | | | | |--- yr_of_estab <= 1983.50 | | | | | | | |--- prevailing_wage <= 146230.30 | | | | | | | | |--- no_of_employees <= 42.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 42.50 | | | | | | | | | |--- prevailing_wage <= 146073.00 | | | | | | | | | | |--- prevailing_wage <= 7668.12 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- prevailing_wage > 7668.12 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | |--- prevailing_wage > 146073.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- prevailing_wage > 146230.30 | | | | | | | | |--- continent_South America <= 0.50 | | | | | | | | | |--- no_of_employees <= 6771.00 | | | | | | | | | | |--- yr_of_estab <= 1979.50 | | | | | | | | | | | |--- weights: [0.00, 41.00] class: 1 | | | | | | | | | | |--- yr_of_estab > 1979.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_employees > 6771.00 | | | | | | | | | | |--- prevailing_wage <= 161508.94 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 161508.94 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- continent_South America > 0.50 | | | | | | | | | |--- prevailing_wage <= 205387.09 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- prevailing_wage > 205387.09 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- yr_of_estab > 1983.50 | | | | | | | |--- education_of_employee_Master's <= 0.50 | | | | | | | | |--- prevailing_wage <= 52797.89 | | | | | | | | | |--- prevailing_wage <= 49531.99 | | | | | | | | | | |--- prevailing_wage <= 12634.42 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 12634.42 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- prevailing_wage > 49531.99 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 52797.89 | | | | | | | | | |--- prevailing_wage <= 84548.84 | | | | | | | | | | |--- no_of_employees <= 3941.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_employees > 3941.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- prevailing_wage > 84548.84 | | | | | | | | | | |--- no_of_employees <= 1158.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_employees > 1158.00 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- education_of_employee_Master's > 0.50 | | | | | | | | |--- yr_of_estab <= 1990.50 | | | | | | | | | |--- no_of_employees <= 661.00 | | | | | | | | | | |--- no_of_employees <= 203.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_employees > 203.00 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 661.00 | | | | | | | | | | |--- no_of_employees <= 1618.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_employees > 1618.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- yr_of_estab > 1990.50 | | | | | | | | | |--- prevailing_wage <= 50985.89 | | | | | | | | | | |--- no_of_employees <= 429.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_employees > 429.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- prevailing_wage > 50985.89 | | | | | | | | | | |--- prevailing_wage <= 52847.39 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- prevailing_wage > 52847.39 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | |--- yr_of_estab > 2004.50 | | | | | | |--- prevailing_wage <= 9198.22 | | | | | | | |--- no_of_employees <= 1653.00 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 1653.00 | | | | | | | | |--- no_of_employees <= 4925.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- no_of_employees > 4925.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- prevailing_wage > 9198.22 | | | | | | | |--- yr_of_estab <= 2014.50 | | | | | | | | |--- continent_Oceania <= 0.50 | | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | | |--- no_of_employees <= 784.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_employees > 784.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | | |--- yr_of_estab <= 2010.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- yr_of_estab > 2010.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- continent_Oceania > 0.50 | | | | | | | | | |--- no_of_employees <= 3284.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 3284.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- yr_of_estab > 2014.50 | | | | | | | | |--- no_of_employees <= 1386.50 | | | | | | | | | |--- prevailing_wage <= 142347.97 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- prevailing_wage > 142347.97 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 1386.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | |--- continent_Europe > 0.50 | | | | |--- prevailing_wage <= 27125.25 | | | | | |--- no_of_employees <= 1347.50 | | | | | | |--- weights: [0.00, 49.00] class: 1 | | | | | |--- no_of_employees > 1347.50 | | | | | | |--- no_of_employees <= 1402.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- no_of_employees > 1402.50 | | | | | | | |--- no_of_employees <= 2039.00 | | | | | | | | |--- no_of_employees <= 2037.00 | | | | | | | | | |--- yr_of_estab <= 2006.50 | | | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 19.00] class: 1 | | | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- yr_of_estab > 2006.50 | | | | | | | | | | |--- prevailing_wage <= 9680.11 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 9680.11 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- no_of_employees > 2037.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 2039.00 | | | | | | | | |--- no_of_employees <= 4284.50 | | | | | | | | | |--- prevailing_wage <= 1593.03 | | | | | | | | | | |--- no_of_employees <= 3023.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_employees > 3023.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- prevailing_wage > 1593.03 | | | | | | | | | | |--- weights: [0.00, 62.00] class: 1 | | | | | | | | |--- no_of_employees > 4284.50 | | | | | | | | | |--- no_of_employees <= 4357.00 | | | | | | | | | | |--- prevailing_wage <= 25782.11 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- prevailing_wage > 25782.11 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_employees > 4357.00 | | | | | | | | | | |--- yr_of_estab <= 1995.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- yr_of_estab > 1995.50 | | | | | | | | | | | |--- weights: [0.00, 29.00] class: 1 | | | | |--- prevailing_wage > 27125.25 | | | | | |--- prevailing_wage <= 27214.77 | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- prevailing_wage > 27214.77 | | | | | | |--- no_of_employees <= 52.00 | | | | | | | |--- yr_of_estab <= 2001.50 | | | | | | | | |--- prevailing_wage <= 53593.63 | | | | | | | | | |--- prevailing_wage <= 34283.16 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- prevailing_wage > 34283.16 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 53593.63 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- yr_of_estab > 2001.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- no_of_employees > 52.00 | | | | | | | |--- education_of_employee_Doctorate <= 0.50 | | | | | | | | |--- prevailing_wage <= 127061.20 | | | | | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | | | | | |--- yr_of_estab <= 1945.50 | | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | | |--- yr_of_estab > 1945.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | |--- prevailing_wage > 127061.20 | | | | | | | | | |--- prevailing_wage <= 131520.57 | | | | | | | | | | |--- no_of_employees <= 3733.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_employees > 3733.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- prevailing_wage > 131520.57 | | | | | | | | | | |--- yr_of_estab <= 1979.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- yr_of_estab > 1979.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | |--- education_of_employee_Doctorate > 0.50 | | | | | | | | |--- no_of_employees <= 5583.50 | | | | | | | | | |--- no_of_employees <= 1285.50 | | | | | | | | | | |--- no_of_employees <= 1259.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_employees > 1259.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_employees > 1285.50 | | | | | | | | | | |--- prevailing_wage <= 29598.92 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- prevailing_wage > 29598.92 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- no_of_employees > 5583.50 | | | | | | | | | |--- prevailing_wage <= 46579.42 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- prevailing_wage > 46579.42 | | | | | | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | |--- has_job_experience > 0.50 | | |--- education_of_employee_Master's <= 0.50 | | | |--- education_of_employee_Doctorate <= 0.50 | | | | |--- unit_of_wage_Year <= 0.50 | | | | | |--- unit_of_wage_Week <= 0.50 | | | | | | |--- unit_of_wage_Month <= 0.50 | | | | | | | |--- no_of_employees <= 1469.50 | | | | | | | | |--- prevailing_wage <= 223248.00 | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 223248.00 | | | | | | | | | |--- prevailing_wage <= 233616.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- prevailing_wage > 233616.00 | | | | | | | | | | |--- yr_of_estab <= 2011.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- yr_of_estab > 2011.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- no_of_employees > 1469.50 | | | | | | | | |--- no_of_employees <= 2724.50 | | | | | | | | | |--- yr_of_estab <= 2001.50 | | | | | | | | | | |--- no_of_employees <= 2528.00 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- no_of_employees > 2528.00 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- yr_of_estab > 2001.50 | | | | | | | | | | |--- yr_of_estab <= 2007.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- yr_of_estab > 2007.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- no_of_employees > 2724.50 | | | | | | | | | |--- no_of_employees <= 2916.00 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 2916.00 | | | | | | | | | | |--- yr_of_estab <= 1946.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- yr_of_estab > 1946.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | |--- unit_of_wage_Month > 0.50 | | | | | | | |--- yr_of_estab <= 1996.50 | | | | | | | | |--- region_of_employment_West <= 0.50 | | | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | | | |--- no_of_employees <= 643.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_employees > 643.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- region_of_employment_West > 0.50 | | | | | | | | | |--- no_of_employees <= 427.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_employees > 427.00 | | | | | | | | | | |--- continent_Asia <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- continent_Asia > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- yr_of_estab > 1996.50 | | | | | | | | |--- yr_of_estab <= 1999.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 1999.50 | | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | | |--- no_of_employees <= 3686.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- no_of_employees > 3686.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- unit_of_wage_Week > 0.50 | | | | | | |--- yr_of_estab <= 1996.50 | | | | | | | |--- yr_of_estab <= 1994.00 | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | |--- region_of_employment_West <= 0.50 | | | | | | | | | | |--- no_of_employees <= 2629.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_employees > 2629.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- region_of_employment_West > 0.50 | | | | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- full_time_position > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- yr_of_estab > 1994.00 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- yr_of_estab > 1996.50 | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- full_time_position > 0.50 | | | | | | | | |--- continent_North America <= 0.50 | | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | | | |--- continent_North America > 0.50 | | | | | | | | | |--- no_of_employees <= 3051.50 | | | | | | | | | | |--- no_of_employees <= 1267.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_employees > 1267.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 3051.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | |--- unit_of_wage_Year > 0.50 | | | | | |--- continent_Europe <= 0.50 | | | | | | |--- continent_Asia <= 0.50 | | | | | | | |--- prevailing_wage <= 7522.50 | | | | | | | | |--- continent_Oceania <= 0.50 | | | | | | | | | |--- weights: [0.00, 16.00] class: 1 | | | | | | | | |--- continent_Oceania > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- prevailing_wage > 7522.50 | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | |--- no_of_employees <= 612.00 | | | | | | | | | | |--- no_of_employees <= 335.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_employees > 335.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_employees > 612.00 | | | | | | | | | | |--- no_of_employees <= 909.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_employees > 909.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | |--- yr_of_estab <= 2011.50 | | | | | | | | | | |--- prevailing_wage <= 32908.69 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- prevailing_wage > 32908.69 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- yr_of_estab > 2011.50 | | | | | | | | | | |--- no_of_employees <= 5463.00 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_employees > 5463.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- continent_Asia > 0.50 | | | | | | | |--- no_of_employees <= 24.00 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 24.00 | | | | | | | | |--- prevailing_wage <= 188467.90 | | | | | | | | | |--- prevailing_wage <= 1515.91 | | | | | | | | | | |--- yr_of_estab <= 2003.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- yr_of_estab > 2003.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | |--- prevailing_wage > 1515.91 | | | | | | | | | | |--- no_of_employees <= 2424.50 | | | | | | | | | | | |--- truncated branch of depth 39 | | | | | | | | | | |--- no_of_employees > 2424.50 | | | | | | | | | | | |--- truncated branch of depth 27 | | | | | | | | |--- prevailing_wage > 188467.90 | | | | | | | | | |--- yr_of_estab <= 2004.50 | | | | | | | | | | |--- no_of_employees <= 245.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_employees > 245.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- yr_of_estab > 2004.50 | | | | | | | | | | |--- region_of_employment_West <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- region_of_employment_West > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | |--- continent_Europe > 0.50 | | | | | | |--- prevailing_wage <= 158681.25 | | | | | | | |--- yr_of_estab <= 1987.50 | | | | | | | | |--- prevailing_wage <= 125827.25 | | | | | | | | | |--- prevailing_wage <= 38093.38 | | | | | | | | | | |--- no_of_employees <= 241.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_employees > 241.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- prevailing_wage > 38093.38 | | | | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- prevailing_wage > 125827.25 | | | | | | | | | |--- weights: [0.00, 17.00] class: 1 | | | | | | | |--- yr_of_estab > 1987.50 | | | | | | | | |--- prevailing_wage <= 130823.08 | | | | | | | | | |--- no_of_employees <= 1804.50 | | | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_employees > 1804.50 | | | | | | | | | | |--- no_of_employees <= 1816.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_employees > 1816.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | |--- prevailing_wage > 130823.08 | | | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | | | |--- yr_of_estab <= 2011.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- yr_of_estab > 2011.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- prevailing_wage > 158681.25 | | | | | | | |--- prevailing_wage <= 216135.16 | | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | | | | | |--- prevailing_wage <= 167895.16 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- prevailing_wage > 167895.16 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | | |--- prevailing_wage <= 165582.23 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- prevailing_wage > 165582.23 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- prevailing_wage > 216135.16 | | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- full_time_position > 0.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | |--- education_of_employee_Doctorate > 0.50 | | | | |--- unit_of_wage_Year <= 0.50 | | | | | |--- yr_of_estab <= 2006.50 | | | | | | |--- no_of_employees <= 2536.00 | | | | | | | |--- yr_of_estab <= 1997.50 | | | | | | | | |--- yr_of_estab <= 1964.00 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- yr_of_estab > 1964.00 | | | | | | | | | |--- unit_of_wage_Week <= 0.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- unit_of_wage_Week > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- yr_of_estab > 1997.50 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | |--- no_of_employees > 2536.00 | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | |--- yr_of_estab <= 1996.00 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | |--- yr_of_estab > 1996.00 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | |--- yr_of_estab > 2006.50 | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | |--- unit_of_wage_Year > 0.50 | | | | | |--- requires_job_training <= 0.50 | | | | | | |--- continent_North America <= 0.50 | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | |--- continent_Oceania <= 0.50 | | | | | | | | | | |--- continent_South America <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- continent_South America > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- continent_Oceania > 0.50 | | | | | | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | |--- weights: [0.00, 69.00] class: 1 | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | | |--- continent_Asia <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- continent_Asia > 0.50 | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | |--- full_time_position > 0.50 | | | | | | | | | |--- weights: [0.00, 147.00] class: 1 | | | | | | |--- continent_North America > 0.50 | | | | | | | |--- no_of_employees <= 242.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 242.50 | | | | | | | | |--- yr_of_estab <= 2002.50 | | | | | | | | | |--- prevailing_wage <= 107494.71 | | | | | | | | | | |--- prevailing_wage <= 103225.95 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- prevailing_wage > 103225.95 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- prevailing_wage > 107494.71 | | | | | | | | | | |--- weights: [0.00, 15.00] class: 1 | | | | | | | | |--- yr_of_estab > 2002.50 | | | | | | | | | |--- weights: [0.00, 23.00] class: 1 | | | | | |--- requires_job_training > 0.50 | | | | | | |--- yr_of_estab <= 2013.50 | | | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | | | |--- no_of_employees <= 7072.50 | | | | | | | | | |--- no_of_employees <= 285.00 | | | | | | | | | | |--- continent_Europe <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- continent_Europe > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 285.00 | | | | | | | | | | |--- prevailing_wage <= 140409.80 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- prevailing_wage > 140409.80 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- no_of_employees > 7072.50 | | | | | | | | | |--- continent_Europe <= 0.50 | | | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- continent_Europe > 0.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | | | |--- prevailing_wage <= 80474.88 | | | | | | | | | |--- prevailing_wage <= 76966.84 | | | | | | | | | | |--- yr_of_estab <= 1982.00 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | | |--- yr_of_estab > 1982.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- prevailing_wage > 76966.84 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 80474.88 | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | |--- yr_of_estab > 2013.50 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | |--- education_of_employee_Master's > 0.50 | | | |--- unit_of_wage_Year <= 0.50 | | | | |--- continent_Asia <= 0.50 | | | | | |--- continent_Europe <= 0.50 | | | | | | |--- unit_of_wage_Week <= 0.50 | | | | | | | |--- no_of_employees <= 3481.00 | | | | | | | | |--- no_of_employees <= 1440.50 | | | | | | | | | |--- no_of_employees <= 341.50 | | | | | | | | | | |--- yr_of_estab <= 2011.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- yr_of_estab > 2011.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- no_of_employees > 341.50 | | | | | | | | | | |--- region_of_employment_West <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- region_of_employment_West > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- no_of_employees > 1440.50 | | | | | | | | | |--- region_of_employment_West <= 0.50 | | | | | | | | | | |--- yr_of_estab <= 2008.50 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | | | |--- yr_of_estab > 2008.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- region_of_employment_West > 0.50 | | | | | | | | | | |--- no_of_employees <= 3367.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- no_of_employees > 3367.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 3481.00 | | | | | | | | |--- yr_of_estab <= 1940.75 | | | | | | | | | |--- no_of_employees <= 3758.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_employees > 3758.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 1940.75 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- unit_of_wage_Week > 0.50 | | | | | | | |--- no_of_employees <= 4385.00 | | | | | | | | |--- no_of_employees <= 594.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 594.00 | | | | | | | | | |--- yr_of_estab <= 2004.00 | | | | | | | | | | |--- region_of_employment_West <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | | |--- region_of_employment_West > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- yr_of_estab > 2004.00 | | | | | | | | | | |--- continent_North America <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- continent_North America > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_employees > 4385.00 | | | | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | | | | |--- yr_of_estab <= 1954.25 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- yr_of_estab > 1954.25 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- continent_Europe > 0.50 | | | | | | |--- full_time_position <= 0.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- full_time_position > 0.50 | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- continent_Asia > 0.50 | | | | | |--- no_of_employees <= 84.00 | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- no_of_employees > 84.00 | | | | | | |--- no_of_employees <= 2608.00 | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | |--- no_of_employees <= 469.00 | | | | | | | | | |--- weights: [0.00, 16.00] class: 1 | | | | | | | | |--- no_of_employees > 469.00 | | | | | | | | | |--- no_of_employees <= 478.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 478.50 | | | | | | | | | | |--- prevailing_wage <= 209973.79 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- prevailing_wage > 209973.79 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | |--- yr_of_estab <= 1987.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- yr_of_estab > 1987.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- no_of_employees > 2608.00 | | | | | | | |--- no_of_employees <= 2659.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 2659.50 | | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- full_time_position > 0.50 | | | | | | | | | |--- no_of_employees <= 6324.00 | | | | | | | | | | |--- no_of_employees <= 4449.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_employees > 4449.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_employees > 6324.00 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | |--- unit_of_wage_Year > 0.50 | | | | |--- region_of_employment_West <= 0.50 | | | | | |--- full_time_position <= 0.50 | | | | | | |--- no_of_employees <= 132.50 | | | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | | | |--- no_of_employees <= 58.50 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | |--- no_of_employees > 58.50 | | | | | | | | | |--- prevailing_wage <= 134699.42 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- prevailing_wage > 134699.42 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- no_of_employees > 132.50 | | | | | | | |--- prevailing_wage <= 4850.04 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- prevailing_wage > 4850.04 | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | |--- prevailing_wage <= 165227.40 | | | | | | | | | | |--- prevailing_wage <= 149482.00 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- prevailing_wage > 149482.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- prevailing_wage > 165227.40 | | | | | | | | | | |--- continent_North America <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- continent_North America > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | |--- continent_Oceania <= 0.50 | | | | | | | | | | |--- prevailing_wage <= 113653.92 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- prevailing_wage > 113653.92 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- continent_Oceania > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- full_time_position > 0.50 | | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | |--- continent_South America <= 0.50 | | | | | | | | | |--- continent_North America <= 0.50 | | | | | | | | | | |--- continent_Oceania <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- continent_Oceania > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- continent_North America > 0.50 | | | | | | | | | | |--- no_of_employees <= 7060.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- no_of_employees > 7060.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- continent_South America > 0.50 | | | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | | | |--- weights: [0.00, 10.00] class: 1 | | | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | | | |--- weights: [0.00, 17.00] class: 1 | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | |--- prevailing_wage <= 228499.93 | | | | | | | | | |--- yr_of_estab <= 2011.00 | | | | | | | | | | |--- no_of_employees <= 1463.50 | | | | | | | | | | | |--- weights: [0.00, 33.00] class: 1 | | | | | | | | | | |--- no_of_employees > 1463.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- yr_of_estab > 2011.00 | | | | | | | | | | |--- no_of_employees <= 239.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_employees > 239.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- prevailing_wage > 228499.93 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | | |--- continent_North America <= 0.50 | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | |--- yr_of_estab <= 2000.50 | | | | | | | | | | |--- yr_of_estab <= 1999.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- yr_of_estab > 1999.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- yr_of_estab > 2000.50 | | | | | | | | | | |--- prevailing_wage <= 509.51 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- prevailing_wage > 509.51 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | |--- continent_Oceania <= 0.50 | | | | | | | | | | |--- prevailing_wage <= 51707.83 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- prevailing_wage > 51707.83 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- continent_Oceania > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- continent_North America > 0.50 | | | | | | | | |--- no_of_employees <= 114.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 114.50 | | | | | | | | | |--- yr_of_estab <= 1968.50 | | | | | | | | | | |--- no_of_employees <= 6551.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_employees > 6551.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- yr_of_estab > 1968.50 | | | | | | | | | | |--- prevailing_wage <= 1094.35 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- prevailing_wage > 1094.35 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | |--- region_of_employment_West > 0.50 | | | | | |--- continent_South America <= 0.50 | | | | | | |--- continent_North America <= 0.50 | | | | | | | |--- continent_Oceania <= 0.50 | | | | | | | | |--- continent_Asia <= 0.50 | | | | | | | | | |--- no_of_employees <= 96.00 | | | | | | | | | | |--- yr_of_estab <= 1997.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- yr_of_estab > 1997.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_employees > 96.00 | | | | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- full_time_position > 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- continent_Asia > 0.50 | | | | | | | | | |--- yr_of_estab <= 2001.50 | | | | | | | | | | |--- yr_of_estab <= 1997.50 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- yr_of_estab > 1997.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- yr_of_estab > 2001.50 | | | | | | | | | | |--- prevailing_wage <= 16170.72 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- prevailing_wage > 16170.72 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | |--- continent_Oceania > 0.50 | | | | | | | | |--- no_of_employees <= 2123.00 | | | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- full_time_position > 0.50 | | | | | | | | | | |--- prevailing_wage <= 55312.33 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- prevailing_wage > 55312.33 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_employees > 2123.00 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- continent_North America > 0.50 | | | | | | | |--- yr_of_estab <= 2011.50 | | | | | | | | |--- prevailing_wage <= 10972.25 | | | | | | | | | |--- yr_of_estab <= 1962.00 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- yr_of_estab > 1962.00 | | | | | | | | | | |--- full_time_position <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- full_time_position > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- prevailing_wage > 10972.25 | | | | | | | | | |--- prevailing_wage <= 79129.82 | | | | | | | | | | |--- no_of_employees <= 1663.00 | | | | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | | | | | |--- no_of_employees > 1663.00 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | |--- prevailing_wage > 79129.82 | | | | | | | | | | |--- prevailing_wage <= 131951.44 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- prevailing_wage > 131951.44 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- yr_of_estab > 2011.50 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- continent_South America > 0.50 | | | | | | |--- prevailing_wage <= 32867.22 | | | | | | | |--- yr_of_estab <= 1998.50 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- yr_of_estab > 1998.50 | | | | | | | | |--- yr_of_estab <= 2004.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 2004.00 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- prevailing_wage > 32867.22 | | | | | | | |--- yr_of_estab <= 1998.50 | | | | | | | | |--- prevailing_wage <= 113576.45 | | | | | | | | | |--- prevailing_wage <= 59437.28 | | | | | | | | | | |--- no_of_employees <= 1648.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- no_of_employees > 1648.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- prevailing_wage > 59437.28 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 113576.45 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- yr_of_estab > 1998.50 | | | | | | | | |--- weights: [6.00, 0.00] class: 0 |--- education_of_employee_High School > 0.50 | |--- continent_Asia <= 0.50 | | |--- continent_Europe <= 0.50 | | | |--- has_job_experience <= 0.50 | | | | |--- yr_of_estab <= 1968.00 | | | | | |--- yr_of_estab <= 1961.50 | | | | | | |--- yr_of_estab <= 1946.50 | | | | | | | |--- prevailing_wage <= 46697.76 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- prevailing_wage > 46697.76 | | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | | |--- prevailing_wage <= 57871.71 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- prevailing_wage > 57871.71 | | | | | | | | | | |--- no_of_employees <= 5129.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- no_of_employees > 5129.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- yr_of_estab > 1946.50 | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- yr_of_estab > 1961.50 | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | |--- yr_of_estab > 1968.00 | | | | | |--- region_of_employment_Northeast <= 0.50 | | | | | | |--- yr_of_estab <= 2009.50 | | | | | | | |--- yr_of_estab <= 2006.50 | | | | | | | | |--- no_of_employees <= 2553.50 | | | | | | | | | |--- no_of_employees <= 1365.50 | | | | | | | | | | |--- yr_of_estab <= 1996.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- yr_of_estab > 1996.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_employees > 1365.50 | | | | | | | | | | |--- yr_of_estab <= 1976.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- yr_of_estab > 1976.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- no_of_employees > 2553.50 | | | | | | | | | |--- no_of_employees <= 2666.00 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- no_of_employees > 2666.00 | | | | | | | | | | |--- yr_of_estab <= 1997.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- yr_of_estab > 1997.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- yr_of_estab > 2006.50 | | | | | | | | |--- continent_Oceania <= 0.50 | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | |--- continent_Oceania > 0.50 | | | | | | | | | |--- no_of_employees <= 767.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 767.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- yr_of_estab > 2009.50 | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | |--- yr_of_estab <= 2012.00 | | | | | | | | | |--- unit_of_wage_Year <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- unit_of_wage_Year > 0.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- yr_of_estab > 2012.00 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | |--- region_of_employment_Northeast > 0.50 | | | | | | |--- no_of_employees <= 6664.50 | | | | | | | |--- no_of_employees <= 3089.00 | | | | | | | | |--- no_of_employees <= 1365.00 | | | | | | | | | |--- no_of_employees <= 1248.50 | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 1248.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_employees > 1365.00 | | | | | | | | | |--- weights: [23.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 3089.00 | | | | | | | | |--- prevailing_wage <= 90632.39 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 90632.39 | | | | | | | | | |--- no_of_employees <= 3927.00 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- no_of_employees > 3927.00 | | | | | | | | | | |--- prevailing_wage <= 116871.55 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- prevailing_wage > 116871.55 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | |--- no_of_employees > 6664.50 | | | | | | | |--- prevailing_wage <= 141638.03 | | | | | | | | |--- yr_of_estab <= 1993.00 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 1993.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- prevailing_wage > 141638.03 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- has_job_experience > 0.50 | | | | |--- prevailing_wage <= 21595.07 | | | | | |--- unit_of_wage_Year <= 0.50 | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- unit_of_wage_Year > 0.50 | | | | | | |--- continent_South America <= 0.50 | | | | | | | |--- no_of_employees <= 6480.50 | | | | | | | | |--- weights: [0.00, 24.00] class: 1 | | | | | | | |--- no_of_employees > 6480.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- continent_South America > 0.50 | | | | | | | |--- yr_of_estab <= 1998.50 | | | | | | | | |--- no_of_employees <= 3115.00 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 3115.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- yr_of_estab > 1998.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | |--- prevailing_wage > 21595.07 | | | | | |--- prevailing_wage <= 24023.89 | | | | | | |--- no_of_employees <= 1241.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_employees > 1241.00 | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | |--- prevailing_wage > 24023.89 | | | | | | |--- no_of_employees <= 473.00 | | | | | | | |--- prevailing_wage <= 160790.63 | | | | | | | | |--- continent_North America <= 0.50 | | | | | | | | | |--- no_of_employees <= 228.00 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 228.00 | | | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- continent_North America > 0.50 | | | | | | | | | |--- no_of_employees <= 185.50 | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | |--- no_of_employees > 185.50 | | | | | | | | | | |--- yr_of_estab <= 2009.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- yr_of_estab > 2009.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- prevailing_wage > 160790.63 | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | |--- no_of_employees > 473.00 | | | | | | | |--- no_of_employees <= 981.50 | | | | | | | | |--- yr_of_estab <= 1995.00 | | | | | | | | | |--- no_of_employees <= 671.00 | | | | | | | | | | |--- no_of_employees <= 545.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_employees > 545.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 671.00 | | | | | | | | | | |--- region_of_employment_South <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | | |--- region_of_employment_South > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 1995.00 | | | | | | | | | |--- weights: [0.00, 16.00] class: 1 | | | | | | | |--- no_of_employees > 981.50 | | | | | | | | |--- yr_of_estab <= 2009.00 | | | | | | | | | |--- prevailing_wage <= 126562.11 | | | | | | | | | | |--- yr_of_estab <= 1946.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- yr_of_estab > 1946.00 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- prevailing_wage > 126562.11 | | | | | | | | | | |--- no_of_employees <= 2743.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- no_of_employees > 2743.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- yr_of_estab > 2009.00 | | | | | | | | | |--- yr_of_estab <= 2013.50 | | | | | | | | | | |--- no_of_employees <= 2544.00 | | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | | |--- no_of_employees > 2544.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- yr_of_estab > 2013.50 | | | | | | | | | | |--- no_of_employees <= 3118.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_employees > 3118.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | |--- continent_Europe > 0.50 | | | |--- region_of_employment_West <= 0.50 | | | | |--- prevailing_wage <= 4025.76 | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | |--- prevailing_wage > 4025.76 | | | | | |--- prevailing_wage <= 5798.96 | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | |--- prevailing_wage > 5798.96 | | | | | | |--- yr_of_estab <= 1935.00 | | | | | | | |--- no_of_employees <= 125.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- no_of_employees > 125.50 | | | | | | | | |--- prevailing_wage <= 142231.16 | | | | | | | | | |--- prevailing_wage <= 37092.47 | | | | | | | | | | |--- prevailing_wage <= 14228.38 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- prevailing_wage > 14228.38 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- prevailing_wage > 37092.47 | | | | | | | | | | |--- no_of_employees <= 308.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_employees > 308.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- prevailing_wage > 142231.16 | | | | | | | | | |--- no_of_employees <= 2656.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- no_of_employees > 2656.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- yr_of_estab > 1935.00 | | | | | | | |--- yr_of_estab <= 1966.50 | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | |--- no_of_employees <= 5301.00 | | | | | | | | | | |--- no_of_employees <= 3015.50 | | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | | |--- no_of_employees > 3015.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_employees > 5301.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- yr_of_estab > 1966.50 | | | | | | | | |--- prevailing_wage <= 46240.11 | | | | | | | | | |--- yr_of_estab <= 1992.50 | | | | | | | | | | |--- yr_of_estab <= 1976.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- yr_of_estab > 1976.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- yr_of_estab > 1992.50 | | | | | | | | | | |--- no_of_employees <= 916.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- no_of_employees > 916.00 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- prevailing_wage > 46240.11 | | | | | | | | | |--- yr_of_estab <= 1985.50 | | | | | | | | | | |--- region_of_employment_Midwest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- region_of_employment_Midwest > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- yr_of_estab > 1985.50 | | | | | | | | | | |--- prevailing_wage <= 52136.59 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- prevailing_wage > 52136.59 | | | | | | | | | | | |--- truncated branch of depth 15 | | | |--- region_of_employment_West > 0.50 | | | | |--- no_of_employees <= 3180.00 | | | | | |--- prevailing_wage <= 2497.47 | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- prevailing_wage > 2497.47 | | | | | | |--- no_of_employees <= 64.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_employees > 64.50 | | | | | | | |--- yr_of_estab <= 2006.50 | | | | | | | | |--- yr_of_estab <= 2005.50 | | | | | | | | | |--- no_of_employees <= 479.50 | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 479.50 | | | | | | | | | | |--- prevailing_wage <= 97411.21 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- prevailing_wage > 97411.21 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 2005.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- yr_of_estab > 2006.50 | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | |--- no_of_employees > 3180.00 | | | | | |--- no_of_employees <= 3331.50 | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | |--- no_of_employees > 3331.50 | | | | | | |--- yr_of_estab <= 1975.50 | | | | | | | |--- yr_of_estab <= 1954.50 | | | | | | | | |--- no_of_employees <= 5479.00 | | | | | | | | | |--- prevailing_wage <= 38981.93 | | | | | | | | | | |--- prevailing_wage <= 18602.95 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 18602.95 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- prevailing_wage > 38981.93 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- no_of_employees > 5479.00 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- yr_of_estab > 1954.50 | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- yr_of_estab > 1975.50 | | | | | | | |--- yr_of_estab <= 2011.00 | | | | | | | | |--- no_of_employees <= 5369.50 | | | | | | | | | |--- yr_of_estab <= 2004.00 | | | | | | | | | | |--- prevailing_wage <= 83756.20 | | | | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | | | | |--- prevailing_wage > 83756.20 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- yr_of_estab > 2004.00 | | | | | | | | | | |--- prevailing_wage <= 113804.24 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 113804.24 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 5369.50 | | | | | | | | | |--- yr_of_estab <= 1993.00 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- yr_of_estab > 1993.00 | | | | | | | | | | |--- prevailing_wage <= 36883.51 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- prevailing_wage > 36883.51 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- yr_of_estab > 2011.00 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | |--- continent_Asia > 0.50 | | |--- region_of_employment_West <= 0.50 | | | |--- region_of_employment_Northeast <= 0.50 | | | | |--- prevailing_wage <= 5379.81 | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | |--- prevailing_wage > 5379.81 | | | | | |--- yr_of_estab <= 1996.50 | | | | | | |--- yr_of_estab <= 1983.50 | | | | | | | |--- no_of_employees <= 312.50 | | | | | | | | |--- no_of_employees <= 81.50 | | | | | | | | | |--- prevailing_wage <= 67099.56 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- prevailing_wage > 67099.56 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 81.50 | | | | | | | | | |--- prevailing_wage <= 95508.68 | | | | | | | | | | |--- prevailing_wage <= 50513.11 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 50513.11 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- prevailing_wage > 95508.68 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | |--- no_of_employees > 312.50 | | | | | | | | |--- no_of_employees <= 935.00 | | | | | | | | | |--- no_of_employees <= 699.00 | | | | | | | | | | |--- no_of_employees <= 611.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_employees > 611.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- no_of_employees > 699.00 | | | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 935.00 | | | | | | | | | |--- no_of_employees <= 1278.50 | | | | | | | | | | |--- yr_of_estab <= 1971.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- yr_of_estab > 1971.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 1278.50 | | | | | | | | | | |--- no_of_employees <= 1327.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_employees > 1327.00 | | | | | | | | | | | |--- truncated branch of depth 22 | | | | | | |--- yr_of_estab > 1983.50 | | | | | | | |--- no_of_employees <= 1616.50 | | | | | | | | |--- no_of_employees <= 28.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_employees > 28.00 | | | | | | | | | |--- no_of_employees <= 254.00 | | | | | | | | | | |--- no_of_employees <= 189.00 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_employees > 189.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_employees > 254.00 | | | | | | | | | | |--- prevailing_wage <= 32000.36 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- prevailing_wage > 32000.36 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | |--- no_of_employees > 1616.50 | | | | | | | | |--- yr_of_estab <= 1992.50 | | | | | | | | | |--- no_of_employees <= 5357.00 | | | | | | | | | | |--- prevailing_wage <= 55748.05 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- prevailing_wage > 55748.05 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- no_of_employees > 5357.00 | | | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 1992.50 | | | | | | | | | |--- prevailing_wage <= 133422.91 | | | | | | | | | | |--- no_of_employees <= 1760.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_employees > 1760.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- prevailing_wage > 133422.91 | | | | | | | | | | |--- unit_of_wage_Year <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- unit_of_wage_Year > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | |--- yr_of_estab > 1996.50 | | | | | | |--- no_of_employees <= 588.50 | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | |--- prevailing_wage <= 19372.01 | | | | | | | | | |--- prevailing_wage <= 12600.46 | | | | | | | | | | |--- prevailing_wage <= 6978.48 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- prevailing_wage > 6978.48 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- prevailing_wage > 12600.46 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- prevailing_wage > 19372.01 | | | | | | | | | |--- no_of_employees <= 505.00 | | | | | | | | | | |--- no_of_employees <= 495.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- no_of_employees > 495.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_employees > 505.00 | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | |--- prevailing_wage <= 83324.25 | | | | | | | | | |--- no_of_employees <= 365.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 365.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- prevailing_wage > 83324.25 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- no_of_employees > 588.50 | | | | | | | |--- no_of_employees <= 629.00 | | | | | | | | |--- yr_of_estab <= 1997.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 1997.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- no_of_employees > 629.00 | | | | | | | | |--- has_job_experience <= 0.50 | | | | | | | | | |--- prevailing_wage <= 66641.22 | | | | | | | | | | |--- yr_of_estab <= 2000.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- yr_of_estab > 2000.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- prevailing_wage > 66641.22 | | | | | | | | | | |--- prevailing_wage <= 87682.91 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- prevailing_wage > 87682.91 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- has_job_experience > 0.50 | | | | | | | | | |--- no_of_employees <= 5267.00 | | | | | | | | | | |--- no_of_employees <= 4562.00 | | | | | | | | | | | |--- truncated branch of depth 16 | | | | | | | | | | |--- no_of_employees > 4562.00 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- no_of_employees > 5267.00 | | | | | | | | | | |--- yr_of_estab <= 2001.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- yr_of_estab > 2001.50 | | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | |--- region_of_employment_Northeast > 0.50 | | | | |--- full_time_position <= 0.50 | | | | | |--- weights: [32.00, 0.00] class: 0 | | | | |--- full_time_position > 0.50 | | | | | |--- unit_of_wage_Year <= 0.50 | | | | | | |--- requires_job_training <= 0.50 | | | | | | | |--- yr_of_estab <= 1999.50 | | | | | | | | |--- yr_of_estab <= 1977.50 | | | | | | | | | |--- no_of_employees <= 546.00 | | | | | | | | | | |--- no_of_employees <= 261.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_employees > 261.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_employees > 546.00 | | | | | | | | | | |--- yr_of_estab <= 1975.50 | | | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | | | |--- yr_of_estab > 1975.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- yr_of_estab > 1977.50 | | | | | | | | | |--- weights: [25.00, 0.00] class: 0 | | | | | | | |--- yr_of_estab > 1999.50 | | | | | | | | |--- prevailing_wage <= 94310.40 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- prevailing_wage > 94310.40 | | | | | | | | | |--- yr_of_estab <= 2002.00 | | | | | | | | | | |--- no_of_employees <= 2761.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- no_of_employees > 2761.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- yr_of_estab > 2002.00 | | | | | | | | | | |--- yr_of_estab <= 2012.50 | | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | | | | | |--- yr_of_estab > 2012.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- requires_job_training > 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- unit_of_wage_Year > 0.50 | | | | | | |--- no_of_employees <= 139.50 | | | | | | | |--- yr_of_estab <= 2011.00 | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | |--- yr_of_estab > 2011.00 | | | | | | | | |--- prevailing_wage <= 51519.17 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 51519.17 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- no_of_employees > 139.50 | | | | | | | |--- prevailing_wage <= 44895.46 | | | | | | | | |--- no_of_employees <= 277.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- no_of_employees > 277.00 | | | | | | | | | |--- no_of_employees <= 3014.00 | | | | | | | | | | |--- no_of_employees <= 2068.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_employees > 2068.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- no_of_employees > 3014.00 | | | | | | | | | | |--- yr_of_estab <= 1979.50 | | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | | | |--- yr_of_estab > 1979.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- prevailing_wage > 44895.46 | | | | | | | | |--- prevailing_wage <= 87325.16 | | | | | | | | | |--- yr_of_estab <= 2008.50 | | | | | | | | | | |--- no_of_employees <= 591.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- no_of_employees > 591.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- yr_of_estab > 2008.50 | | | | | | | | | | |--- prevailing_wage <= 85202.47 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- prevailing_wage > 85202.47 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- prevailing_wage > 87325.16 | | | | | | | | | |--- prevailing_wage <= 143328.84 | | | | | | | | | | |--- no_of_employees <= 172.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_employees > 172.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- prevailing_wage > 143328.84 | | | | | | | | | | |--- no_of_employees <= 2869.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- no_of_employees > 2869.00 | | | | | | | | | | | |--- truncated branch of depth 5 | | |--- region_of_employment_West > 0.50 | | | |--- no_of_employees <= 5548.50 | | | | |--- no_of_employees <= 512.50 | | | | | |--- prevailing_wage <= 108693.75 | | | | | | |--- full_time_position <= 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- full_time_position > 0.50 | | | | | | | |--- prevailing_wage <= 14820.68 | | | | | | | | |--- prevailing_wage <= 11777.77 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- prevailing_wage > 11777.77 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- prevailing_wage > 14820.68 | | | | | | | | |--- yr_of_estab <= 2010.00 | | | | | | | | | |--- yr_of_estab <= 1970.00 | | | | | | | | | | |--- no_of_employees <= 228.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- no_of_employees > 228.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- yr_of_estab > 1970.00 | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | |--- yr_of_estab > 2010.00 | | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- prevailing_wage > 108693.75 | | | | | | |--- yr_of_estab <= 2013.00 | | | | | | | |--- no_of_employees <= 98.50 | | | | | | | | |--- has_job_experience <= 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- has_job_experience > 0.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- no_of_employees > 98.50 | | | | | | | | |--- yr_of_estab <= 2004.00 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- yr_of_estab > 2004.00 | | | | | | | | | |--- no_of_employees <= 479.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- no_of_employees > 479.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- yr_of_estab > 2013.00 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | |--- no_of_employees > 512.50 | | | | | |--- yr_of_estab <= 2011.50 | | | | | | |--- yr_of_estab <= 1933.75 | | | | | | | |--- no_of_employees <= 1003.00 | | | | | | | | |--- no_of_employees <= 957.50 | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 957.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_employees > 1003.00 | | | | | | | | |--- weights: [28.00, 0.00] class: 0 | | | | | | |--- yr_of_estab > 1933.75 | | | | | | | |--- yr_of_estab <= 1936.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- yr_of_estab > 1936.00 | | | | | | | | |--- no_of_employees <= 1293.00 | | | | | | | | | |--- no_of_employees <= 884.50 | | | | | | | | | | |--- no_of_employees <= 805.50 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | | |--- no_of_employees > 805.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- no_of_employees > 884.50 | | | | | | | | | | |--- weights: [23.00, 0.00] class: 0 | | | | | | | | |--- no_of_employees > 1293.00 | | | | | | | | | |--- no_of_employees <= 1319.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- no_of_employees > 1319.00 | | | | | | | | | | |--- prevailing_wage <= 11372.00 | | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | | | | | |--- prevailing_wage > 11372.00 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | |--- yr_of_estab > 2011.50 | | | | | | |--- no_of_employees <= 1382.50 | | | | | | | |--- yr_of_estab <= 2012.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- yr_of_estab > 2012.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- no_of_employees > 1382.50 | | | | | | | |--- no_of_employees <= 2889.00 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- no_of_employees > 2889.00 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | |--- no_of_employees > 5548.50 | | | | |--- no_of_employees <= 6854.00 | | | | | |--- yr_of_estab <= 2005.00 | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | |--- yr_of_estab > 2005.00 | | | | | | |--- prevailing_wage <= 28867.76 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- prevailing_wage > 28867.76 | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- no_of_employees > 6854.00 | | | | | |--- yr_of_estab <= 1989.50 | | | | | | |--- yr_of_estab <= 1971.50 | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- yr_of_estab > 1971.50 | | | | | | | |--- prevailing_wage <= 92896.77 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- prevailing_wage > 92896.77 | | | | | | | | |--- requires_job_training <= 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- requires_job_training > 0.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- yr_of_estab > 1989.50 | | | | | | |--- prevailing_wage <= 56150.82 | | | | | | | |--- yr_of_estab <= 2002.00 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- yr_of_estab > 2002.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- prevailing_wage > 56150.82 | | | | | | | |--- weights: [9.00, 0.00] class: 0
importance_of_attributes = pd.DataFrame(dTree.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False)
importance_of_attributes
| Imp | |
|---|---|
| no_of_employees | 0.263516 |
| prevailing_wage | 0.240149 |
| yr_of_estab | 0.160512 |
| education_of_employee_High School | 0.078179 |
| has_job_experience | 0.045457 |
| unit_of_wage_Year | 0.030819 |
| education_of_employee_Master's | 0.021166 |
| continent_Europe | 0.018981 |
| full_time_position | 0.018044 |
| requires_job_training | 0.017260 |
| education_of_employee_Doctorate | 0.015964 |
| region_of_employment_Midwest | 0.015637 |
| region_of_employment_South | 0.014583 |
| region_of_employment_Northeast | 0.014157 |
| continent_Asia | 0.012366 |
| region_of_employment_West | 0.012364 |
| continent_North America | 0.009700 |
| continent_South America | 0.004952 |
| continent_Oceania | 0.002575 |
| unit_of_wage_Week | 0.002449 |
| unit_of_wage_Month | 0.001170 |
importances = dTree.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='green', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
As per the decision tree model, no_of_employees is the important variable for predicting
The deeper the tree, the more complex the model because it will have more splits and this captures more information and this is one of the causes for overfitting. Now, Lets try to limit the tree to 3.
d_Tree_limit3 = DecisionTreeClassifier(criterion = 'gini',max_depth=3,random_state=1)
d_Tree_limit3.fit(X_train, y_train)
DecisionTreeClassifier(max_depth=3, random_state=1)
make_confusion_matrix_for_model(d_Tree_limit3, y_test)
#Using above defined function to get accuracy, recall and precision on train and test set
dTree_score_limit3=get_metrics_score(d_Tree_limit3)
Accuracy on training set : 0.7298161022650819 Accuracy on test set : 0.7243589743589743 Recall on training set : 0.926970536388819 Recall on test set : 0.929285014691479 Precision on training set : 0.7365928495197439 Precision on test set : 0.7309707241910631 f1-score on training set : 0.820888310722914 f1-score on test set : 0.8182837429926693
plt.figure(figsize=(15,10))
tree.plot_tree(d_Tree_limit3,feature_names=feature_names,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
#Text Format
print(tree.export_text(d_Tree_limit3,feature_names=feature_names,show_weights=True))
|--- education_of_employee_High School <= 0.50 | |--- has_job_experience <= 0.50 | | |--- unit_of_wage_Year <= 0.50 | | | |--- weights: [584.00, 284.00] class: 0 | | |--- unit_of_wage_Year > 0.50 | | | |--- weights: [2013.00, 3599.00] class: 1 | |--- has_job_experience > 0.50 | | |--- education_of_employee_Master's <= 0.50 | | | |--- weights: [1292.00, 3695.00] class: 1 | | |--- education_of_employee_Master's > 0.50 | | | |--- weights: [427.00, 3524.00] class: 1 |--- education_of_employee_High School > 0.50 | |--- continent_Asia <= 0.50 | | |--- continent_Europe <= 0.50 | | | |--- weights: [217.00, 225.00] class: 1 | | |--- continent_Europe > 0.50 | | | |--- weights: [231.00, 120.00] class: 0 | |--- continent_Asia > 0.50 | | |--- region_of_employment_West <= 0.50 | | | |--- weights: [897.00, 405.00] class: 0 | | |--- region_of_employment_West > 0.50 | | | |--- weights: [262.00, 61.00] class: 0
# importance of features of tree building
importance_of_the_features = pd.DataFrame(d_Tree_limit3.feature_importances_, columns = ["Imp"], index = X_train.columns).sort_values(by = 'Imp', ascending = False)
importance_of_the_features
| Imp | |
|---|---|
| education_of_employee_High School | 0.500464 |
| has_job_experience | 0.264116 |
| unit_of_wage_Year | 0.120021 |
| education_of_employee_Master's | 0.081335 |
| continent_Asia | 0.018963 |
| continent_Europe | 0.008847 |
| region_of_employment_West | 0.006254 |
| yr_of_estab | 0.000000 |
| unit_of_wage_Week | 0.000000 |
| unit_of_wage_Month | 0.000000 |
| region_of_employment_South | 0.000000 |
| region_of_employment_Northeast | 0.000000 |
| region_of_employment_Midwest | 0.000000 |
| no_of_employees | 0.000000 |
| prevailing_wage | 0.000000 |
| education_of_employee_Doctorate | 0.000000 |
| requires_job_training | 0.000000 |
| continent_Oceania | 0.000000 |
| continent_North America | 0.000000 |
| full_time_position | 0.000000 |
| continent_South America | 0.000000 |
importances_of_d_Tree_limit3 = d_Tree_limit3.feature_importances_
indices = np.argsort(importances_of_d_Tree_limit3)
plt.figure(figsize=(10,10))
plt.title('Feature Importances of d_Tree')
plt.barh(range(len(indices)), importances_of_d_Tree_limit3[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance of d_Tree')
plt.show()
# Choose the type of classifier.
estimator3 = DecisionTreeClassifier(random_state=1)
# Grid of parameters to choose from
parameters = {'max_depth': np.arange(1,10),
'min_samples_leaf': [1, 2, 4, 7, 9],
"criterion": ["entropy", "gini"],
'min_impurity_decrease': [0.001,0.01,0.1]
}
# scoring used to compare parameter
acc_scorer_value = metrics.make_scorer(metrics.recall_score)
# grid search
grid_obj = GridSearchCV(estimator3, parameters, scoring=acc_scorer_value,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator3 = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator3.fit(X_train, y_train)
DecisionTreeClassifier(criterion='entropy', max_depth=1,
min_impurity_decrease=0.1, random_state=1)
make_confusion_matrix_for_model(estimator3,y_test)
#Using above defined function to get accuracy, recall and precision on train and test set
dTree_score_limit3=get_metrics_score(estimator3)
Accuracy on training set : 0.6679188158779995 Accuracy on test set : 0.6678440607012036 Recall on training set : 1.0 Recall on test set : 1.0 Precision on training set : 0.6679188158779995 Precision on test set : 0.6678440607012036 f1-score on training set : 0.8009008706174997 f1-score on test set : 0.8008471252647267
plt.figure(figsize=(15,10))
tree.plot_tree(estimator3,feature_names=feature_names,filled=True,fontsize=9,node_ids=True,class_names=True)
plt.show()
#base_estimator for bagging classifier is a decision tree by default
baggingClassifier_estimator=BaggingClassifier(random_state=1)
baggingClassifier_estimator.fit(X_train,y_train)
BaggingClassifier(random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
bagging_estimator_score=get_metrics_score(baggingClassifier_estimator)
Accuracy on training set : 0.9850302758466024 Accuracy on test set : 0.696493982208268 Recall on training set : 0.9869050616973055 Recall on test set : 0.7737512242899118 Precision on training set : 0.9906471183013145 Precision on test set : 0.772238514173998 f1-score on training set : 0.9887725495143183 f1-score on test set : 0.7729941291585127
make_confusion_matrix_for_model(baggingClassifier_estimator,y_test)
#Train the random forest classifier
rfClassifier_estimator=RandomForestClassifier(random_state=1)
rfClassifier_estimator.fit(X_train,y_train)
RandomForestClassifier(random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
rf_estimator_score=get_metrics_score(rfClassifier_estimator)
Accuracy on training set : 1.0 Accuracy on test set : 0.7217425431711146 Recall on training set : 1.0 Recall on test set : 0.8366307541625857 Precision on training set : 1.0 Precision on test set : 0.7676132278936018 f1-score on training set : 1.0 f1-score on test set : 0.8006373605773738
make_confusion_matrix_for_model(rfClassifier_estimator,y_test)
# Choose the type of classifier.
bagging_estimator_tuned = BaggingClassifier(random_state=1)
# Grid of parameters to choose from
## add from article
parameters = {
'max_features': [0.7,0.8,0.9,1],
'n_estimators' : [10,20,30,40,50],
'max_samples': [0.7,0.8,0.9,1],
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(bagging_estimator_tuned, parameters, scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
bagging_estimator_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
bagging_estimator_tuned.fit(X_train, y_train)
BaggingClassifier(max_features=0.7, max_samples=1, n_estimators=20,
random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
bagging_estimator_tuned_score=get_metrics_score(bagging_estimator_tuned)
Accuracy on training set : 0.6679188158779995 Accuracy on test set : 0.6678440607012036 Recall on training set : 1.0 Recall on test set : 1.0 Precision on training set : 0.6679188158779995 Precision on test set : 0.6678440607012036 f1-score on training set : 0.8009008706174997 f1-score on test set : 0.8008471252647267
make_confusion_matrix_for_model(bagging_estimator_tuned,y_test)
On hypertuning, the recall value on training and testing data reached 1.0.
bagging_lr=BaggingClassifier(base_estimator=LogisticRegression(solver='liblinear',random_state=1,max_iter=1000),random_state=1)
bagging_lr.fit(X_train,y_train)
BaggingClassifier(base_estimator=LogisticRegression(max_iter=1000,
random_state=1,
solver='liblinear'),
random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
bagging_lr_score=get_metrics_score(bagging_lr)
Accuracy on training set : 0.6679188158779995 Accuracy on test set : 0.6678440607012036 Recall on training set : 1.0 Recall on test set : 1.0 Precision on training set : 0.6679188158779995 Precision on test set : 0.6678440607012036 f1-score on training set : 0.8009008706174997 f1-score on test set : 0.8008471252647267
make_confusion_matrix_for_model(bagging_lr,y_test)
Now, let's see if we can get a better model by tuning the random forest classifier. Some of the important hyperparameters available for random forest classifier are: max_sample_leaf and max_features
# Choose the type of classifier.
rf_estimator_tuned = RandomForestClassifier(random_state=1)
# Grid of parameters to choose from
## add from article
parameters = {
"min_samples_leaf": np.arange(5, 10),
"max_features": np.arange(0.2, 0.7, 0.1),
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(rf_estimator_tuned, parameters, scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
rf_estimator_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
rf_estimator_tuned.fit(X_train, y_train)
RandomForestClassifier(max_features=0.2, min_samples_leaf=9, random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
rf_estimator_tuned_score=get_metrics_score(rf_estimator_tuned)
Accuracy on training set : 0.7781453240636914 Accuracy on test set : 0.7441130298273155 Recall on training set : 0.8954083774028372 Recall on test set : 0.8748285994123408 Precision on training set : 0.7973538645537449 Precision on test set : 0.7722635310392529 f1-score on training set : 0.8435411806571507 f1-score on test set : 0.8203526818515797
make_confusion_matrix_for_model(rf_estimator_tuned,y_test)
y.head()
case_id EZYV01 0 EZYV02 1 EZYV03 0 EZYV04 0 EZYV05 1 Name: case_status, dtype: int64
y.value_counts()
1 17018 0 8462 Name: case_status, dtype: int64
The model performance is not very good. This may be due to the fact that the classes are imbalanced with 67% certified and 33% denied.
# Choose the type of classifier.
rf_estimator_weighted = RandomForestClassifier(random_state=1)
# Grid of parameters to choose from
## add from article
parameters = {
"class_weight": [{0: 0.3, 1: 0.7}],
"min_samples_leaf": np.arange(5, 10),
"max_features": np.arange(0.2, 0.7, 0.1),
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(rf_estimator_weighted, parameters, scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
rf_estimator_weighted = grid_obj.best_estimator_
# Fit the best algorithm to the data.
rf_estimator_weighted.fit(X_train, y_train)
RandomForestClassifier(class_weight={0: 0.3, 1: 0.7}, max_features=0.2,
min_samples_leaf=9, random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
rf_estimator_weighted_score=get_metrics_score(rf_estimator_weighted)
Accuracy on training set : 0.7350302758466024 Accuracy on test set : 0.7042124542124543 Recall on training set : 0.9871568874338957 Recall on test set : 0.9698334965719883 Precision on training set : 0.7200146941774322 Precision on test set : 0.7014735052422783 f1-score on training set : 0.8326842738794874 f1-score on test set : 0.8141083614239908
make_confusion_matrix_for_model(rf_estimator_weighted,y_test)
importances = rf_estimator_weighted.feature_importances_
indices = np.argsort(importances)
feature_names = list(X.columns)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
abc_1 = AdaBoostClassifier(random_state=1)
abc_1.fit(X_train,y_train)
AdaBoostClassifier(random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
abc_score=get_metrics_score(abc_1)
Accuracy on training set : 0.7382260596546311 Accuracy on test set : 0.7352171637885924 Recall on training set : 0.8876017795685386 Recall on test set : 0.8854064642507345 Precision on training set : 0.7605005753739931 Precision on test set : 0.7585165296190636 f1-score on training set : 0.819150172367045 f1-score on test set : 0.8170643528561099
make_confusion_matrix_for_model(abc_1,y_test)
gbc_1 = GradientBoostingClassifier(random_state=1)
gbc_1.fit(X_train,y_train)
GradientBoostingClassifier(random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
gbc_score=get_metrics_score(gbc_1)
Accuracy on training set : 0.7564476339986544 Accuracy on test set : 0.7429356357927787 Recall on training set : 0.8800470074708302 Recall on test set : 0.8730656219392752 Precision on training set : 0.7824464512277035 Precision on test set : 0.7719085555940423 f1-score on training set : 0.8283817951959545 f1-score on test set : 0.8193767809541319
make_confusion_matrix_for_model(gbc_1,y_test)
xgb_1 = XGBClassifier(random_state=1,eval_metric='logloss')
xgb_1.fit(X_train,y_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
colsample_bynode=1, colsample_bytree=1, eval_metric='logloss',
gamma=0, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.300000012,
max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=100, n_jobs=8,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=1, subsample=1, tree_method='exact',
validate_parameters=1, verbosity=None)
#Using above defined function to get accuracy, recall and precision on train and test set
xgb_score=get_metrics_score(xgb_1)
Accuracy on training set : 0.8384166853554609 Accuracy on test set : 0.7328623757195186 Recall on training set : 0.9283136069839671 Recall on test set : 0.8550440744368266 Precision on training set : 0.8450370596775426 Precision on test set : 0.7702488088935945 f1-score on training set : 0.8847200000000001 f1-score on test set : 0.8104344597103601
make_confusion_matrix_for_model(xgb_1,y_test)
# Choose the type of classifier.
abc_tuned = AdaBoostClassifier(random_state=1)
# Grid of parameters to choose from
## add from article
parameters = {
#Let's try different max_depth for base_estimator
"base_estimator":[DecisionTreeClassifier(max_depth=1, random_state=1),DecisionTreeClassifier(max_depth=2, random_state=1),DecisionTreeClassifier(max_depth=3, random_state=1),DecisionTreeClassifier(max_depth=4, random_state=1)],
"n_estimators": np.arange(10,110,20),
"learning_rate":np.arange(0.1,2,0.1)
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(abc_tuned, parameters, scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
abc_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
abc_tuned.fit(X_train, y_train)
AdaBoostClassifier(base_estimator=DecisionTreeClassifier(max_depth=1,
random_state=1),
learning_rate=0.2, n_estimators=10, random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
abc_tuned_score=get_metrics_score(abc_tuned)
Accuracy on training set : 0.6919152276295133 Accuracy on test set : 0.6890371533228676 Recall on training set : 0.9709560983799211 Recall on test set : 0.9704211557296768 Precision on training set : 0.6919717635798038 Precision on test set : 0.6899721448467967 f1-score on training set : 0.8080617555625416 f1-score on test set : 0.8065120065120066
make_confusion_matrix_for_model(abc_tuned,y_test)
importances = abc_tuned.feature_importances_
indices = np.argsort(importances)
feature_names = list(X.columns)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
gbc_init = GradientBoostingClassifier(init=AdaBoostClassifier(random_state=1),random_state=1)
gbc_init.fit(X_train,y_train)
GradientBoostingClassifier(init=AdaBoostClassifier(random_state=1),
random_state=1)
#Using above defined function to get accuracy, recall and precision on train and test set
gbc_init_score=get_metrics_score(gbc_init)
Accuracy on training set : 0.7573446961202063 Accuracy on test set : 0.7438513867085296 Recall on training set : 0.8818097876269622 Recall on test set : 0.875024485798237 Precision on training set : 0.7824953445065177 Precision on test set : 0.7719025401762571 f1-score on training set : 0.8291893598547635 f1-score on test set : 0.8202350348879912
# Choose the type of classifier.
gbc_tuned = GradientBoostingClassifier(init=AdaBoostClassifier(random_state=1),random_state=1)
# Grid of parameters to choose from
## add from article
parameters = {
"n_estimators": [50,100,150,200,250],
"subsample":[0.8,0.9,1],
"max_features":[0.7,0.8,0.9,1]
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(gbc_tuned, parameters, scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
gbc_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
gbc_tuned.fit(X_train, y_train)
GradientBoostingClassifier(init=AdaBoostClassifier(random_state=1),
max_features=1, n_estimators=50, random_state=1,
subsample=0.8)
#Using above defined function to get accuracy, recall and precision on train and test set
gbc_tuned_score=get_metrics_score(gbc_tuned)
Accuracy on training set : 0.7205090827539807 Accuracy on test set : 0.7091836734693877 Recall on training set : 0.9456056408964997 Recall on test set : 0.9475024485798237 Precision on training set : 0.7220228175874888 Precision on test set : 0.7121613663133097 f1-score on training set : 0.8188260948573505 f1-score on test set : 0.8131461713036899
make_confusion_matrix_for_model(gbc_tuned,y_test)
importances = gbc_tuned.feature_importances_
indices = np.argsort(importances)
feature_names = list(X.columns)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
# Choose the type of classifier.
xgb_tuned = XGBClassifier(random_state=1,eval_metric='logloss')
# Grid of parameters to choose from
## add from
parameters = {
"n_estimators": np.arange(10,100,20),
"scale_pos_weight":[0,1,2,5],
"subsample":[0.5,0.7,0.9,1],
"learning_rate":[0.01,0.1,0.2],
"colsample_bytree":[0.5,0.7,0.9],
"colsample_bylevel":[0.5,0.7,0.9]
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(xgb_tuned, parameters,scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
xgb_tuned = grid_obj.best_estimator_
# Fit the best algorithm to the data.
xgb_tuned.fit(X_train, y_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=0.5,
colsample_bynode=1, colsample_bytree=0.5, eval_metric='logloss',
gamma=0, gpu_id=-1, importance_type='gain',
interaction_constraints='', learning_rate=0.01, max_delta_step=0,
max_depth=6, min_child_weight=1, missing=nan,
monotone_constraints='()', n_estimators=10, n_jobs=8,
num_parallel_tree=1, random_state=1, reg_alpha=0, reg_lambda=1,
scale_pos_weight=5, subsample=0.5, tree_method='exact',
validate_parameters=1, verbosity=None)
#Using above defined function to get accuracy, recall and precision on train and test set
xgb_tuned_score=get_metrics_score(xgb_tuned)
Accuracy on training set : 0.6679188158779995 Accuracy on test set : 0.6678440607012036 Recall on training set : 1.0 Recall on test set : 1.0 Precision on training set : 0.6679188158779995 Precision on test set : 0.6678440607012036 f1-score on training set : 0.8009008706174997 f1-score on test set : 0.8008471252647267
make_confusion_matrix_for_model(xgb_tuned,y_test)
importances = xgb_tuned.feature_importances_
indices = np.argsort(importances)
feature_names = list(X.columns)
plt.figure(figsize=(12,12))
plt.title('Feature Importances')
plt.barh(range(len(indices)), importances[indices], color='violet', align='center')
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel('Relative Importance')
plt.show()
estimators_1=[('AdaBoost', abc_1),('Gradiant Boost', gbc_1)]
final_estimator_1=RandomForestClassifier(random_state=1)
stacking_estimator_1=StackingClassifier(estimators=estimators_1, final_estimator=final_estimator_1,cv=5)
stacking_estimator_1.fit(X_train,y_train)
StackingClassifier(cv=5,
estimators=[('AdaBoost', AdaBoostClassifier(random_state=1)),
('Gradiant Boost',
GradientBoostingClassifier(random_state=1))],
final_estimator=RandomForestClassifier(random_state=1))
stacking_Score_1=get_metrics_score(stacking_estimator_1)
Accuracy on training set : 0.716640502354788 Accuracy on test set : 0.7008110936682366 Recall on training set : 0.8270796608746748 Recall on test set : 0.8154750244857982 Precision on training set : 0.7669494823694247 Precision on test set : 0.7558097312999273 f1-score on training set : 0.7958804523424878 f1-score on test set : 0.7845095637425799
# defining list of models
models = [dTree,d_Tree_limit3,estimator3, baggingClassifier_estimator,bagging_estimator_tuned,bagging_lr,rfClassifier_estimator,rf_estimator_tuned,
rf_estimator_weighted, abc_1, abc_tuned, gbc_1, gbc_init, gbc_tuned, xgb_1, xgb_tuned, stacking_estimator_1]
# defining empty lists to add train and test results
acc_train = []
acc_test = []
recall_train = []
recall_test = []
precision_train = []
precision_test = []
fscore_train = []
fscore_test = []
# looping through all the models to get the accuracy, precall and precision scores
for model in models:
j = get_metrics_score(model,False)
acc_train.append(np.round(j[0],2))
acc_test.append(np.round(j[1],2))
recall_train.append(np.round(j[2],2))
recall_test.append(np.round(j[3],2))
precision_train.append(np.round(j[4],2))
precision_test.append(np.round(j[5],2))
fscore_train.append(np.round(j[6],2))
fscore_test.append(np.round(j[7],2))
comparison_frame = pd.DataFrame({'Model':['Decision Tree', 'Decision tree with max_depth =3', 'Decision Tree Hypertuned', 'baggingClassifier_estimator',
'bagging_estimator_tuned', 'bagging_logistic_regression',' Random Forest Classifier','Random Forest Classifier - tuned', 'Random Forest Classifier -weighted','AdaBoost with default paramters','AdaBoost Tuned', 'Gradient Boosting with default parameters','Gradient Boosting with init=AdaBoost',
'Gradient Boosting Tuned','XGBoost with default parameters','XGBoost Tuned','Stacking Model'],
'Train_Accuracy': acc_train,'Test_Accuracy': acc_test,
'Train_Recall':recall_train,'Test_Recall':recall_test,
'Train_Precision':precision_train,'Test_Precision':precision_test,
'Train_f1score':fscore_train,'Test_f1score':fscore_test,
})
comparison_frame
| Model | Train_Accuracy | Test_Accuracy | Train_Recall | Test_Recall | Train_Precision | Test_Precision | Train_f1score | Test_f1score | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | Decision Tree | 1.00 | 0.66 | 1.00 | 0.74 | 1.00 | 0.75 | 1.00 | 0.74 |
| 1 | Decision tree with max_depth =3 | 0.73 | 0.72 | 0.93 | 0.93 | 0.74 | 0.73 | 0.82 | 0.82 |
| 2 | Decision Tree Hypertuned | 0.67 | 0.67 | 1.00 | 1.00 | 0.67 | 0.67 | 0.80 | 0.80 |
| 3 | baggingClassifier_estimator | 0.99 | 0.70 | 0.99 | 0.77 | 0.99 | 0.77 | 0.99 | 0.77 |
| 4 | bagging_estimator_tuned | 0.67 | 0.67 | 1.00 | 1.00 | 0.67 | 0.67 | 0.80 | 0.80 |
| 5 | bagging_logistic_regression | 0.67 | 0.67 | 1.00 | 1.00 | 0.67 | 0.67 | 0.80 | 0.80 |
| 6 | Random Forest Classifier | 1.00 | 0.72 | 1.00 | 0.84 | 1.00 | 0.77 | 1.00 | 0.80 |
| 7 | Random Forest Classifier - tuned | 0.78 | 0.74 | 0.90 | 0.87 | 0.80 | 0.77 | 0.84 | 0.82 |
| 8 | Random Forest Classifier -weighted | 0.74 | 0.70 | 0.99 | 0.97 | 0.72 | 0.70 | 0.83 | 0.81 |
| 9 | AdaBoost with default paramters | 0.74 | 0.74 | 0.89 | 0.89 | 0.76 | 0.76 | 0.82 | 0.82 |
| 10 | AdaBoost Tuned | 0.69 | 0.69 | 0.97 | 0.97 | 0.69 | 0.69 | 0.81 | 0.81 |
| 11 | Gradient Boosting with default parameters | 0.76 | 0.74 | 0.88 | 0.87 | 0.78 | 0.77 | 0.83 | 0.82 |
| 12 | Gradient Boosting with init=AdaBoost | 0.76 | 0.74 | 0.88 | 0.88 | 0.78 | 0.77 | 0.83 | 0.82 |
| 13 | Gradient Boosting Tuned | 0.72 | 0.71 | 0.95 | 0.95 | 0.72 | 0.71 | 0.82 | 0.81 |
| 14 | XGBoost with default parameters | 0.84 | 0.73 | 0.93 | 0.86 | 0.85 | 0.77 | 0.88 | 0.81 |
| 15 | XGBoost Tuned | 0.67 | 0.67 | 1.00 | 1.00 | 0.67 | 0.67 | 0.80 | 0.80 |
| 16 | Stacking Model | 0.72 | 0.70 | 0.83 | 0.82 | 0.77 | 0.76 | 0.80 | 0.78 |
OFLC can save a lot of resources and time by pre-filtering/ pre-sorting the candidates based on the model. They can start processing the applications where the candidates have high level of education, job experience, prevailing wage and being employed in regions like the North East. They could also raise the minimum requirements for prevailing wage, Education, job experience so that they get only high quality candidates who are more likely to be certified. Since the model selected has high recall and a decent accuracy, the model predictions on positive certifications will help reduce resource wastage while keeping the opportunity cost of losing good candidates to the minimum.